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INTRODUCTION 


aaa the past decades, statistics has been flourishing on the medical 
scene. Cooperative studies to evaluate methods of treatment for rheumatic 


fever, field trials of the effectiveness of poliomyelitis vaccine and of fluorides in 
water as a preventive measure against tooth decay, investigations of cigarette 
smoking as a cause of lung cancer—these and many other important endeavors 
have earned respect for the contribution statisticians can make to medical re- 
search. Recognizing that the definiteness of conclusions from research depends 
largely upon the excellence of the statistical design and that this in turn depends 
upon the clearness with which the initial question is stated, medical scientists are 
tending to collaborate more closely with statistical colleagues. Medical school 
faculties, schools of public health, governmental and voluntary health agencies, 
medical care organizations, and others concerned with the advancement of 
medical knowledge have entered into intense competition for the relatively few 
well-trained statisticians who have an interest in the medical field. 

Nevertheless, considerable confusion still prevails in medical circles as to 
the benefits and limitations of applying modern statistics to the problems of 
disease causation and treatment. The word “modern”’ is used advisedly to empha- 
size that progress has been and is being made in statistics itself. 

It therefore seems timely to devote an issue of the JOURNAL OF CHRONI( 
DISEASES to this symposium on statistical problems in medicine. 
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Statistics in medicine is, of course, not new. More than a century ago, Sir 
John Snow applied statistical data and reasoning in determining that cholera is 
spread by polluted water. Snow arrived at this conclusion several years before 
Pasteur enunciated the germ theory and several decades before the cholera 
bacillus was identified, by tabulating the cases of cholera among users of two 
London water supplies. Both supplies, in 1849, were taken from the grossly 
polluted lower Thames, and both sets of users suffered high rates of cholera. 
By 1854, one company, the Lambeth, had moved its intake upstream above the 
gross sewage pollution whereas the other, Southwark and Vauxhall, did not move 
its intake. Frost has summarized the key phase of the study!: ‘‘By a personal 
investigation, with some aid from an assistant, Snow ascertained the source of 
the water actually supplied to each house in which a death from cholera occurred 
in the south districts of London during the first seven weeks of the epidemic. 
He then ascertained the total number of houses supplied by each company and 
presented the following comparison of mortality: 


DEATHS FROM DEATHS IN EACH 
WATER SUPPLY NUMBER OF HOUSES CHOLERA 10,000 HOUSES 


Southwark and Vauxhall Company 10.046 
Lambeth Company 26,107 
Other districts of London 256,423 


“Thus he showed that in this area one group of the population, using a 
grossly polluted water supply, suffered a cholera mortality rate nine times that 
of the other group, consisting of the same kind of people, residing in the same 
area, living often in adjacent houses, and subject to an environment differing, 
so far as could be ascertained, only in the circumstance that their usual water 
supply came from a different and less polluted source. Two years later, after his 
observations had been confirmed and extended by an official inquiry, he was 
able to show that this disparity of mortality was consistent in every one of the 
twenty-three subdistricts in which the two supplies were mingled.”’ 

Snow was a pioneer in the application of statistical reasoning to problems 
in medicine. Many others have followed Snow, among them Goldberger in his 
work on pellagra and Collis in his work on silicosis.* While the underlying logic 
of the statistical approach remains the same today as it was in their time, much 


progress in the theory of statistics has been made in recent years. New sampling 


techniques and methods of estimation have been developed and new means of 
identifying and analyzing the sources of variation in multifactor situations have 
come into being. The handling of large masses of data has been enormously 
improved. All these things mean that the investigator has at hand more powerful 
and sensitive tools to aid him in determining the likelihood that a particular 


*See reference 2 for additional examples 
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outcome of treating a group of patients is the result of treatment, the result of 
other factors, or a result of chance; whether a characteristic is truly associated 
with the occurrence of disease or an artefact of observation. These are funda- 
mental questions. To ignore them and to scoff at the use of statistics in medical 
research as ‘‘mere statistics’’ is to ignore the systematic observation and measure- 
ment of phenomena which are the very basis of science. 

The real contribution of statistics to medicine is far more than a matter of 


technique. Statistics in the service of medicine provides a means of identifying 


and quantitating the sources of variation. The work of Snow clearly illustrates 
this. But today, when chronic diseases are of so great importance, it may also 
be said that the control of these diseases rests very largely upon the degree to 
which such identification and quantification is achieved. It is as an aid in the 
unraveling of the natural history of disease that the real significance of statistics 
in medicine lies. The symposium presented in this issue has been developed from 


this point of view. 
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ASIC to the study of the natural history of disease is the process of classi- 

fication. The orderly arrangement of facts so as to bring out the relation- 
ships among them is fundamental to any scientific inquiry. It is precisely such 
orderly arrangement which is the essence of classification and underlies the 
clinician's concern with the problems of diagnosis and prognosis, the evaluation 
of therapy, the estimation of disease incidence, and the determination of the 
epidemiologic aspects of disease. 

The importance of classification in attempting to unravel the natural history 
of a disease lies in the fact that it defines the scope and area of inquiry. Once 
the classifications and definitions are fixed, the investigator will not find it 
possible to obtain more information than that provided by the rubrics of the 
classification. Therefore, the appropriateness of any classification to the pur- 


poses of the study must be considered in each instance. The omission of a detail 


may well make it necessary to go back and reclassify the data in the final stages 
of the study. In some cases, particularly if the data are precoded, this may not 
be possible and the results of the study may not be as significant as they might 
have been. 

It is the purpose of this paper to discuss some of the classification problems 
involved in the study of disease. The general principles of classification will 


also be presented. 


CLASSIFICATIONS IN THE STUDY OF DISEASE 


The primary principle of classification is that the type of classification to 
be used depends upon the purpose of the study and the nature of the data to be 
classified. If one were engaged in the study of the incidence of certain conditions 
in a community using data collected from lay respondents in a household survey, 
it would soon be apparent that a precise diagnostic classification would not be 


*Mortality Analysis Branch 
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suitable without major adaptations. There would be too many medical com- 
plaints in lay terms which would not fit into a detailed diagnostic classification. 

There are many kinds of classifications involved in various aspects of the 
study of disease. For each of these classifications, there is no single arrangement 
which is universally correct. From the standpoint of comparability, it would be 
desirable to adopt an established classification which is widely used. It is more 
important, however, that the classification be selected to make possible the 
production of information directly related to the purposes of the study. 

The following are examples of various types of classification problems en- 
countered in different kinds of studies of disease. 

Clinical Study.—It is often said that modern clinical medicine was founded 
by Thomas Sydenham. ‘‘What Sydenham saw above all in the patient—what he 
wrenched forth to contemplate was the typical, the pathological process which he 
had observed in others before and expected to see tn others again |italics mine]. 
Hippocrates wrote the histories of such persons, but Sydenham wrote the history 
of diseases.’’! Here we have an early example of the process of classification 
applied to Sydenham’s detailed observations on the patient to bring order out 
of the symptomatic and diagnostic anarchy which then prevailed. 

Today, the physician reaches a diagnosis by following somewhat the same 
path as did Sydenham. When a physician observes a patient he notes the presence 
or absence of various clinical signs and pathologic conditions, in addition to 
certain physical and personal characteristics of the patient. A diagnosis is made 
when these observations fit the diagnostic criteria for a particular disease Ad- 
mittedly this is an oversimplification of the process of arriving at a diagnosis, 
since not all findings are typical or clear cut; neither is it always possible to 
secure all the information needed for a positive diagnosis. The point is, however, 
that the establishment of a diagnosis, whether tentative or definitive, is essentially 
a classification procedure, and the classification employed is a set of diagnostic 
criteria. 

While a disease entity is defined by the diagnostic criteria used, for conven- 
ience of recording or discussion, a disease must be called by a name. This has 
always been a problem because the same disease may be called by several different 
names. In order to avoid confusion in medical terminology, there have been de. 
veloped lists of approved medical terms to describe each recognized diagnostic 
entity. The Standard Nomenclature of Diseases and Operations’ constitutes the 
source of approved medical terminology in the United States. 

A medical nomenclature, to be useful, must be arranged in an orderly fashion 
and the terms must be clear and unambiguous. As new diagnostic entities are 
described, the nomenclature must accommodate them. On the other hand, as 
medical knowledge increases some of the terms become obsolete. These must be 


replaced by more precise or appropriate terminology in the light of current 


medical knowledge. 

Study of Prognosis.—Once having established the diagnosis, the physician's 
next concern is with prognosis. Here consideration must be given to all the sig- 
nificant factors that determine the outcome of the illness. Among these factors 
are the course of treatment including the kind of drugs used, dosage and frequency 
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of the dose, stage of the disease, personal characteristics such as age and sex, 
etc., which may affect the measurements selected as the criteria of success or 
failure of the treatment. All of these may raise problems of classification. 
Consider, for example, the relatively simple matter of age. Several questions 
may arise here. What age is pertinent to the objectives of the study? Age at first 
recognition of symptoms? Age at diagnosis? Age at operation? Or, in certain 
types of follow-up studies, age at recall for examination? Clearly the answer 1s 


dependent upon the purposes of the study. These need to be explicitly stated 


before the collection of the data, and their implications for gathering the neces- 
sary information must be carefully evaluated. 

Age is a quantitative variable, that is, it can be expressed in numerical terms 
on a scale. Similarly, the dosage or frequency of dose may also be classified along 
a quantitative axis, but the stage of disease is ordinarily described in terms such 
as ‘‘minimal,’’ ‘‘moderate,’’ and ‘“‘far advanced” or ‘“‘early’’ and ‘‘late,’’ etc. This 
is a classification of attributes. Sometimes a pseudoquantitative scale is attempted, 
such as when cases are graded 1 plus, 2 plus, 3 plus, or 4 plus. This is not really 
a quantitative scale because the relationship between one part of the “‘scale’’ 
and another is not precisely defined. (Is 4 plus twice as bad, or good, as 2 plus?) 

The problem of arriving at classifications of nonquantitative variables which 
are reliable—that is, can be reasonably well duplicated by different investigators 
is one of the most vexing difficulties confronting the student of the natural 
history of disease. It is well illustrated in the paper by Birkelo, Yerushalmy, and 
colleagues® on the problem of reading x-ray films in the study of tuberculosis. 
It is discussed elsewhere in this symposium by Mainland. In fact, if laboratory 
tests or other means could be devised to change the classification scheme from 
the attribute type to the quantitative type, the study of prognosis would be 
greatly enhanced. 

In launching his studies of prognosis, the investigator probably begins by 
asking the medical record librarian to pull all the charts with the desired diagnosis. 
This raises another classification problem. If a clinician were looking for cases 
involving a specific disease for study purposes, it would be unlikely that the 
hospital record room would be able to furnish him with all the cases pertinent to 
his study by pulling all records indexed under that particular disease. This is 
because the classification represented by the nomenclature is too fine and es- 
sentially similar cases may be classified elsewhere due to slight differences in 
interpretation or because of certain atypical signs. 

Some form of grouping of diagnostic terms in the nomenclature is necessary 
if the clinician is to get a// the cases meeting the criteria for inclusion in the 
study. This may be accomplished in several ways. The physician making the 
request may go over the Standard Nomenclature and designate all the rubrics 
which might contain cases of the type he desires. Or, the medical record librarian 
may maintain the diagnostic index itself in some form of grouping of the Standard 
Nomenclature terms; this will probably result in the investigator getting more 
charts than fit his criteria, so that he must screen them to be sure they belong in 
the study. But, after all, it is better that the clinician do this than the medical 
record librarian. Such a procedure tends to ensure that the investigator will 
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obtain all the cases that belong in his study. Still another grouping for purposes 
of maintaining a diagnostic index is based upon a modification of the /nternational 
Statistical Classification of Diseases, Injuries, and Causes of Death* (vide infra). 

The Distribution of Disease in the Population.—If the natural history of 
disease is considered to embrace its distribution in the population and identifica- 
tion of the factors influencing that distribution, we shall be concerned with 
counting the number of cases in the population. 

The count which is obtained will depend to a large degree upon the definition 
of ‘‘illness”’ or ‘“‘case’’ which is employed. Therefore, in screening the population 
for cases of a specified disease, say, tuberculosis, the count will be different if those 
included in the study are tuberculin positive, are sputum positive, or show radio- 
logic evidence of tuberculosis. Similar statements can obviously be made about 
other chronic diseases. Clearly it is essential in any population survey to arrive 
at a definition of a case on which there is agreement. Which particular definition 
is employed will depend upon the purposes of the survey. In any case it may be 
noted that the count is more easily made if the classification is on a quantitative 
scale than when the definition of the cases is in terms of attributes. 

Another type of classification problem encountered in population surveys 
is that there will be individuals who have more than one illness, with whom, 
therefore, different diseases are involved. These concurrent illnesses will need 
to be handled differently, depending upon the purposes of the survey. If the 
objective is to determine the prevalence of specific diseases, it will be necessary 
to count each disease separately. On the other hand, if it is desired to determine 
how many persons in the population are ill, irrespective of from what they are 
ill, the count is of persons, not diseases. Finally, if the objective is to determine 
the incidence of the disease, where incidence is measured by the number of new 
cases developing in a given period of time per unit of population, it will probably 
be found easier to define a new case as a case newly diagnosed within the period 
under study. Even with this definition, certain chronic diseases with periods of 
quiescence and activity may at times give trouble because of difficulties in dis- 
tinguishing between new illnesses and recurrent attacks. Obviously, the more is 
known about the natural history of disease, the more easily such questions are 
resolved. 

While in the study of prognosis the investigator is concerned with the classi- 
fication of severity on clinical grounds, in many population studies a more mean- 
ingful definition from the standpoint of the impact of illness on the community is 
one which is based on socioeconomic considerations. The classification of severity 
may be defined as illness or injury which (1) interferes with the normal daily 
activity of a person (e.g., absence from work), (2) requires confinement to bed, 
and (3) necessitates hospitalization. This is the type of classification employed 
by the National Health Survey. Although such a grouping is subject to some 
variation, since it depends upon the subjective judgment of the individuals 


involved and circumstances not necessarily directly related to the illness, it 


does provide a simple system of classification which can be applied with reasonable 
consistency. Whether the clinical or the socioeconomic classification of severity 


is used is obviously dependent upon the aims of the investigator. 
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Classification of Diseases in the Population.—In the statistical description of 
the frequency of various diseases in the population, particularly from the stand- 
point of their public health importance, interest is usually in a group of related 
cases rather than in the individual case. 

The entire range of morbid conditions must be covered in a statistical classi- 
fication, but the number of categories cannot be so extensive that the frequencies 
to be recorded under various categories are too small to be meaningful. A specific 
disease entity should have a separate title only when its separation is warranted 
because the frequency of its occurrence, or its importance as a morbid condition, 


justifies its isolation as a separate category. Many titles in the classification 


will refer to groups of separate but usually related morbid conditions. However, 
every disease or morbid condition, including symptoms, must have a definite 
and appropriate place as an inclusion in one of the categories of the statistical 
classification. Because of the selective nature of a statistical list, there will be 
residual titles for other and miscellaneous conditions which cannot be classified 
under the more specific titles. 

Diseases may be classified in many ways. Perhaps the simplest and most 
understandable method is to classify diseases according to the organ systems of 
the body. Here, all of the diseases affecting particular anatomic parts or sites 
will be categorized under the various organ systems of the body. This would be 
an acceptable system of natural classification of diseases. It would hardly be a 
useful classification, however, unless one were studying different anatomic sites 
and all the diseases affecting them. Since many diseases manifest themselves in 
different parts of the body, a classification whose principal axis is the organ 
system would be impracticable for the statistical compilation of diseases. For 
example, malignant neoplasms will probably be scattered about in one or more 
categories under practically every organ system of the body. If one were interested 
in studying malignant neoplasms of various sites, it would be necessary to identify 
each malignancy category under the various organ systems and bring them to- 
gether. This is an inconvenient and inefficient procedure with danger of errors. 

How then should diseases be classified? This would depend upon the purpose 
to which the statistical compilation is to be put. For example, in a review of data 
on the treatment of injuries in a hospital, the principal interest is in the nature 
of injury. Treatment for a fractured femur is of secondary interest to those en- 
gaged in accident prevention. For this purpose, data on the agency involved, 
including the circumstances of the accident, are needed as the principal axis of 
classification. Such information provides bases for instituting a course of action 
in an attempt to prevent accidental injuries. 

The nature of the medical information to be classified also has an important 
bearing on the usefulness of a classification. An abbreviated morbidity classifi- 
cation would be of limited value in the study of illness treated in a large hospital 
because the required specificity of diagnostic information would be lost. On the 
other hand, a detailed disease classification would not be particularly suitable 
for application to medical information supplied by lay respondents in surveys 


of illness in the general population. Data relating to medical specialties may 
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demand quite different axes of classification from those found in a disease classi- 
fication for general purposes. For example, in the rehabilitation of the physically 
disabled, considerable detail is needed on the type of physical impairment by 
anatomic site. The extent of disability as well as the rehabilitation potential are 
important data which cannot be derived from the usual disease classification. 
Such information represents axes of classification very different from those em- 
ployed in a general description of disease in the population. 


DIFFERENCE BETWEEN A NOMENCLATURE AND STATISTICAL CLASSIFICATION 


A word should be said about the difference between a nomenclature and a 
statistical classification because of the frequent confusion as to the purposes of 
these two classificatory systems. The Standard Nomenclature of Diseases and 
Operations is designed primarily to provide an authoritative list of approved 
medical terms and thus facilitate case studies of specific diseases. Although the 
study of individual cases may eventually result in a statistical paper, the Standard 
Nomenclature per se does not offer a satisfactory base for a statistical classifica- 
tion. This is because a separate category is provided for every morbid condition 
that can be specifically described. On the other hand, a statistical classification 
is concerned with groupings of diagnostic entities. For nomenclature purposes, 
for example, specific conditions of the coronary arteries such as aneurysm, 
atheroma, embolism, and thrombosis must be listed separately. However, a 
statistical classification would be concerned with the various conditions of the 
coronary arteries as a group. While it is not impossible to use the Standard 
Nomenclature of Diseases and Operations for statistical purposes, it would be 
extremely cumbersome to do so even for those thoroughly familiar with its struc- 


ture. 


THE INTERNATIONAL CLASSIFICATION OF DISEASES 


The International Statistical Classification of Diseases, Injuries, and Causes 
of Death? was designed for general use in the study of disease. It deals first with 
diseases caused by well-defined infective agents; these are followed by categories 
for neoplasms and allergic, endocrine, metabolic, and nutritional diseases. Most 
of the remaining diseases are classified according to their principal anatomic 


site, with special sections for mental diseases, complications of pregnancy and 


childbirth, certain diseases of early infancy, and senility and ill-defined conditions 
including symptoms. The last section provides a dual classification of injuries 
according to the nature of injury and according to the external cause of injury. 

Wherever possible, etiology is the principal axis of classification. This gives 
rise to certain problems. For example, it is impossible to obtain under one heading 
all the cardiovascular diseases, since cardiovascular syphilis is classified under 
its etiological title, syphilis; beriberi heart is found under beriberi; and congenital 
malformations of the cardiovascular system are in the section on congenital 
malformations. A major component of diseases of the cardiovascular system, 
the cerebrovascular diseases are now classified under diseases of the nervous 


system and sense organs. 
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Compromises have been made in the J/nternational Classification between 
cause, pathologic condition, anatomic site, age, and circumstances of the onset 
of disease. Experience has shown that it is impossible to maintain a consistent 
axis of classification for all diseases. For example in the etiological classification 
of injuries, the external cause of injury is not strictly a disease classification but 
a mixture of agent of accident, circumstances of the accident, and the nature of 
injury. This is an untidy section from a classification standpoint and does not 
contribute to clear thinking in terms of causative factors of accidents. However, 
data compiled on the basis of this classification have proved useful. 

Although the present International Statistical Classification of Diseases, 
Injuries, and Causes of Death represents a major change over the past Jnter- 
national Lists of Causes of Death to accommodate morbidity coding, the structure 
and framework are essentially those of the Jnternational Lists of Causes of Death. 
This is not necessarily objectionable but, because cause-of-death coding pro- 
cedures are so strongly entrenched in the various countries, the instructions in 
the form of notes under the various rubrics are more useful for coding of mortality 
data than of morbidity information. With greater development of morbidity 
statistics, considerable change in orientation can be expected in the future. 


THE DYNAMIC CHARACTER OF DISEASE CLASSIFICATION 


It is a fact which is not always recognized that a disease classification deals 
with diagnostic terms and not withactual diagnoses. Wherever possible, diagnostic 
terms involving related diseases which are difficult to differentiate clinically are 
grouped together. However, a disease classification, no matter how perfect, can- 
not overcome the defects of inaccurate diagnosis and poor reporting. 

Although it is not possible to make any quantitative statements regarding 
the accuracy of clinical diagnosis, there is no question as to the improvement in 
diagnostic methods and facilities. There have been changes in medical concepts 
new diseases have been discovered. All of these developments have a profound 
effect on diagnostic practice, on nomenclature of disease, and on the statistical 
classification of diseases. Changes must be effected to keep abreast of develop- 
ments in medical science. These changes, unfortunately, create problems of 


incomparability in time trends of disease incidence which cannot be resolved 


conclusively because of the lack of quantitative information. For example, 
coronary disease was an unknown diagnostic entity until Herrick in 1912 first 
described the clinical manifestations of acute coronary occlusion. Since the 
1920s, when the category ‘‘disease of the coronary arteries’? was established, 
coronary diseases have been reported with increasing frequency in mortality 
statistics. How much of this increase is real and how much of it is due to better 
recognition of the disease or the growing popularity of the term cannot be de- 
termined. 

There are other examples of new disease entities coming into being. Cystic 
fibrosis of the pancreas was first described as a clinical entity in 1938 by Anderson. ® 
Although “‘fibrocystic disease of pancreas,”’ ‘‘mucoviscidosis,’’ and other terms 
denoting cystic fibrosis were being reported, a separate category was not estab- 
lished in the 1955 revision of the Jnternational List of Diseases and Causes of 
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Death. However, a separate subdivision was created tentatively under ‘‘other 
diseases of the pancreas”’ for use in compiling mortality statistics for the United 
States. It is expected that the 1965 revision will recognize the need for a separate 
category for cystic fibrosis of the pancreas. 

New medical concepts or changes in them influence disease classification. 
Each new disease entity must be tentatively classified even though the disease 
process may not be completely known. Any revision of the classification should 
be made with the full realization that interim changes will complicate the inter- 
pretation of data. 

Disease classifications are basic to the descriptive study of disease. There 
is no classification that is satisfactory for all purposes. Since a disease classifica- 
tion and its application can affect the results of a study in a substantial way, the 
choice of a disease classification for each purpose warrants serious consideration. 


GENERAL PRINCIPLES OF CLASSIFICATION 


The basic logical requirements of any classification are relatively simple. 
The subclasses must be mutually exclusive, and jointly the subclasses must be 
exhaustive of the universe of discourse. The design for grouping data depends 
primarily on the purpose for which the data are to be used and upon the nature 
of the responses or information available from the study. A classification scheme, 
no matter how good it may be, cannot improve the quality of the observations or 
secure more detail than is available on the original record. 

There are a number of guiding principles in the development of a classifica- 
tion scheme: 

(1) The categories of a classification should be mutally exclusive and clearly 
defined. Overlapping between the rubrics or subclasses dilutes and confuses re- 
sults. If the precision of observation does not permit differentiation between two 
rubrics, say, cerebral embolism and thrombosis, these should be thrown into the 
same subclass. 

(2) The rubrics or categories under the various items should be related to the 
item and meaningful in terms of their eventual use. For example, in a survey 
involving school children, the subjects may be classified by age or by school 
grade. If the classification is by grade in school, the subclasses should not be by 
chronologic age. The groupings should be by school grade and made in such a 
way that they discriminate between meaningful subdivisions in the school system. 

(3) The categories must be selected in such a way that the data within 
each subclass are relatively homogeneous. 

(4) Provision should be made for the classification of every case. Except in 
unusual circumstances, not all of the possibilities can be exhausted with a set 
of specific rubrics. Therefore, it is generally advisable to provide a residual cate- 
gory for ‘“‘other’’ in which information not classifiable elsewhere in the specific 
categories may be placed. Sometimes ‘“‘other’’ may be combined with “unknown 


or unspecified.’’ A large frequency in the residual or unspecified categories indi- 
cates that the classification needs modification or that the responses are of poor 


quality. 
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A reasonable frequency distribution should be obtainable from a good 
classification system. If all the frequencies are concentrated in one or two rubrics 
with few or no events scattered in the remaining groupings, the inclusions under 
the rubrics with high frequencies should be re-examined to see if further sub- 
divisions of those categories are possible. Unless the relative frequencies in the 
various categories are roughly known from prior experience, it will be advisable 
to review a sample of the records in designing the classification scheme. All new 
classifications should be pretested on a sample prior to large-scale use. 

(6) The finer the classification, the more specific will be the data. However, 
it should be possible to discriminate between the contents of one subclass and 
another. A detailed classification will result in smaller frequencies per cell in 
the tabulations. In order to build up the frequencies, a sample of larger size may 
needed or certain subclasses may be grouped. In general, a finer classification is 
to be preferred over a broad one. Detailed subclasses can always be grouped. 
but greater detail cannot be secured from a too broad classification. 

(7) All classifications and editing procedures should be clearly specified 


in writing. Nothing should be trusted to memory. 


SUMMARY 


Classification, or the orderly arrangement of facts for the purpose of gen- 
eralizing from the data at hand, is a fundamental process in the study of the 
natural history of disease. The mass of data must be reduced to manageable 
proportions by classifying it into groups. These groupings may relate to the 
demographic characteristics of the population or to the characteristics of the 
disease. The classifications and definitions used delineate the scope and area of 
inquiry; furthermore, they fix the amount of detailed information that can be 
made available from the study. It is possible to collapse a detailed classification 
into broader groupings, but the reverse is not possible without reclassifying the 
data. 

There are many ways in which diseases and other attributes may be classified. 


Of paramount consideration is how well a particular classification suits the 


purpose of the study. 
Understanding of the data and of the logic of classification is essential in 


any study. They represent important aspects of the scientific method. 
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elegy student of disease has become increasingly interested in its distribution 
in a population and in those factors which influence this distribution, that 
is, in the epidemiologic aspects of disease. The utility of such knowledge has been 
repeatedly demonstrated in studies of the infectious diseases and also in some 
diseases of current medical and public health interest, such as heart disease and 
cancer. In this rather brief survey, an attempt will be made to illustrate the 
usefulness of such knowledge and the methods by which such information is 
obtained and by which these distributions are measured. Many of these aspects 
have been discussed in recent reviews and textbooks.!~* 

In studying the distribution of a disease in a population, attention is paid 
to variations in disease frequency by such factors as time, place, and certain 
individual characteristics of the people who are affected. Thus, the epidemiolo- 
gist studies the changes of the frequency of the disease over a period of time, the 
distribution by various geographic areas, and the frequency of the disease by 
age, sex, race, socioeconomic class, and individual living habits (such as smoking, 
alcohol consumption, etc.). Essentially, he attempts to determine whether 
there are any statistical associations of the disease with these factors. From the 


pattern of these statistical associations, he derives hypotheses concerning the 


nature of the disease, which can be tested by additional epidemiologic, clinical, 


and/or laboratory investigations. 


USES 
Knowledge of the population distribution of disease serves several useful 
purposes: 
(1) It leads to the development of hypotheses concerning etiological factors. 
(2) It can be used to determine if hypotheses developed in the laboratory 
or clinic are consistent with the population distribution of the disease, as well 
as to determine if such a distribution is consistent with the distribution of a 
possible etiological factor, as determined by epidemiologic means. 
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(3) It serves as a scientific basis for disease control measures. We shall briefly 
discuss and illustrate these uses, emphasizing those concerned with the determina- 


tion of etiological factors in disease. 


DEVELOPMENT OF ETIOLOGICAL HYPOTHESES 


This will be illustrated by a study of a current problem. In 1955 Hewitt® 
reported the results of an analysis of the trend of the mortality rate of leukemia 
in England and Wales in specific age groups. He noted that a marked increase 
in the mortality rate had occurred in the third and fourth years of life from 1931 
to 1953 (Fig. 1). It was thought that these trends might be a reflection of the 
introduction of an environmental agent either in the prenatal or postnatal 
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Mortality rates of leukemia in children under 5 years of age, England and Wales, 1931 to 1953. 
After Hewitt, D.: Brit. J. Prev. & Social Med. 9:81, 1955.) 


period. Since previous work had demonstrated a relationship between irradiation 
and leukemia and since over this period of time there had been an increased 
utilization of roentgenographic pelvimetries in obstetric practice, it occurred to 
Hewitt and his colleagues that exposure of the fetus to antenatal x-rays might 
be of etiological importance in childhood leukemia. To test this hypothesis, an 
investigation was carried out in which information was obtained on 1,299 deaths 
from malignant diseases among those under 10 years of age, as well as on a control 
group. The detailed results of this study are reported elsewhere,’ but Tables I 
and II give the results of the study with respect to the relationship of these 
malignant diseases to antenatal irradiation. Table I shows that a larger proportion 
of the patients with cancer had a history of exposure to fetal irradiation for each 
birth rank than was found among the control groups. (Each birth rank was 
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analyzed separately since the frequency of roentgenographic pelvimetry, etc., 
varies with birth rank.) Since all malignant conditions were included in the study, 
the investigators were interested in determining the specific types of malignant 
diseases associated with such exposure; the results of this analysis are presented 
in Table II. In general, the relationship with irradiation was found to be present 


TABLE I. Histories oF Direct FETAL IRRADIATION, DISTINGUISHING THREE BIRTH-RANK 
Groups OF CASES AND CONTROLS* 


CONTROLS 
POSITION IN 
FAMILY OF CHILD RATIO 
(BIRTH RANK ) NUMBER PER CENT NUMBER PER CENT 


First 85/510 36/427 
Second 47/393 28/448 
Later 46/396 29/424 


All 178/1,299 3 93/1, 299 


*Statistics from Stewart, A., Webb, J., and Hewitt, D.: Brit. M. J. 1:1495, 1958. 


not only for leukemia but for many other malignant conditions in childhood. 
It is of interest that the relationship of childhood leukemia with irradiation has 
been independently confirmed by two other investigators.*:’ In interpreting these 
results, Stewart, Webb, and Hewitt’ pointed out that the number of childhood 
malignant diseases attributable to irradiation is small relative to their increase 
over the past two decades. 


TABLE II. COMPARATIVE FREQUENCY OF DiRECT FETAL IRRADIATION AMONG PATIENTS WITH 
LEUKEMIA AND OTHER MALIGNANT DISEASES AND AMONG CONTROL GRoUP* 


IRRADIATED IN UTERO 
DIAGNOSIS TOTAL 
NUMBER PER CENT 
Leukemia, all types 619 79 
Other types of malignant diseases 680 99 


Control group 1,299 93 


*After Stewart, A., Webb, J., and Hewitt, D.: Brit. M. J. 1:1495, 1958. 


It is needless to point out that the results of these studies require further 
confirmation both by similar retrospective investigations and by prospective 


types of study, to be discussed later. However, the former studies do provide 


an idea of the mode of investigation, the types of inferences made, and the utility 
of information on the distribution of disease in the population. 
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DETERMINATION OF CONSISTENCY 


Information concerning the distribution of a disease in a population is useful 
in determining consistency with various etiological hypotheses. This can be 
viewed in two different ways, each of which will be illustrated. 

The term ‘‘consistent’’ is used in the sense that an etiological hypothesis 
determined in the clinic or laboratory should be consistent with the distribution 
of disease in a human population. To the extent that it is not consistent, the 
hypothesis will have to be modified. For example, in 1955 Holsti and Ermala!® 
produced cancer of the bladder in mice by the application of tobacco tar to the 
buccal mucous membrane. To determine if such a relationship also exists in 
humans, Lilienfeld, Levin, and Moore! analyzed the smoking histories of pa- 
tients with cancer of the bladder and other types of cancer and conditions who 
had been admitted to a cancer hospital. The results presented in Table IT] 
show that there is an association between cancer of the bladder and cigarette 
smoking. Thus, a relationship determined by animal experimentation was con- 
firmed in a human group. 

Consistency can be viewed in another light. Let us assume that we have 
knowledge of the distribution of disease in a population and there are variations 
in the frequency of the disease by such characteristics as age, sex, race, geographic 
area, etc. Let us also assume that various studies have indicated a relationship 
between this disease and a certain individual characteristic. The question might 
then be raised as to whether the variations in frequency of the disease in different 


population segments are consistent with the variations in the frequency of 


people with the given characteristic. 


raBLe II]. PERCENTAGE OF MEN AGED 45 YEARS AND OVER WitH History oF ToBacco USE 
BY CLASS OF PATIENT* 


PER CENT WHO USE TOBACCO 


CIGARETTES 
ASS OF PATIENT ALONE AND IN 
ANY TYPE CIGARETTES | COMBINATION ANY TYPE 
O! ONLY WITH OTHER | OTHER THAN 
POBACCO TYPES OI CIGARETTES 
TOBACCO 


ted + 
a 


4 Mu fl C c¢ ntias €S 
Cancer of the bladder 321 
3enign conditions of the bladder 39 
No disease 337 
Cancer of the prostate 287 
Lung cancer 
A ge-adjusted percentages 
Cancer of the bladder 
Benign conditions of the bladder 
No disease 
Cancer of the prostate 
Lung cancer 


64 


*Statistics from Lilienfeld, A. M., Levin, M. L., and Moore, G. E.: A. M. A. Arch. Int. Med. 98: 
1956. 
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TABLE IV. AGE-ADJUSTED AND SPECIFIC LUNG CANCER MorTALITY RATES PER 100,000 PopuLATION 
BY SEX AND URBAN-RURAL RESIDENCE, WITH URBAN/RURAL AND MALE/FEMALE RATIOS, 
1948 AND 1949, UNITED STATES* 


FEMALE URBAN RURAL 
MALE/ MALE 
FEMALE FEMALE 
URBAN | RURAL | U/R RATIO | URBAN | RURAL  U/R RATIO RATIO RATIO 


35-44 
45-54 
55-64 
65 and over 


Total, age adjusted 


*After Haenszel, W., and Shimkin, M. B.: J. Nat. Cancer Inst. 16:1417, 1956. 


TABLE V. URBAN/RURAL RATIOS OF LUNG CANCER DEATH RATES BY AGE AND SEX, AFTER TAKING 
Into ACCOUNT VARIATION IN SMOKING HaBits* 


tt i= 6:03 


fa = €:06 
rs =. 6:07 


64 
and over 


1 

54 1.07 + 0.04 
1 
1 


*After Haenszel, W., and Shimkin, M.B.: J. Nat. Cancer Inst. 16:1417, 1956. 


To illustrate this approach, Table IV presents data on the variation of 
mortality from lung cancer by age, sex, urban-rural residence, race, and geographic 
region. Note the increasing mortality with increasing age for both sexes in both 
urban and rural areas. For each sex there is a greater mortality for those residing 
in urban areas, and in both the rural and urban areas there is a greater mortality 
for males. There is also a greater mortality for those of the white race than for 
the nonwhite population and in the northeastern region of the country as com- 
pared to other regions, for both urban and rural areas. Since lung cancer has been 
found to be related to cigarette smoking in many studies, it is pertinent to ask 
if the distribution of the mortality rate of lung cancer by these variables is 
consistent with the variations in frequency of cigarette smoking in these popula- 
tion subgroups. Haenszel and Shimkin” reported the results of a study in which 
they obtained data on the frequency of cigarette smoking in a sample of the 
United States population and statistically determined if the variations in the 
mortality rate could be accounted for by the variations in cigarette smoking 
frequency. Their results are summarized in Fig. 2. In this graph, where one of the 
marks (circle, cross, etc.) falls on the diagonal, agreement between observation 
(observed ratio) and expectation (predicted ratio) on the basis of cigarette smok- 
ing frequency is indicated. Note that with respect to regional and racial differences 
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there is close agreement between observation and expectation, whereas for sex and 
urban-rural residence the observed ratios are higher. This indicates that the 
higher death rate for lung cancer among men and those residing in urban areas 
is not completely explainable by increased frequency of cigarette smoking among 
men and in cities, although after cigarette smoking is taken into account the 
excess risk in these two subgroups is diminished to some extent. This may be 
more clearly illustrated by comparing Table IV with Table V, which presents 
the urban/rural ratio for each sex and age group after having taken into account 
smoking habits. 

Another illustration of this approach is available from several studies on 
leukemia. Recently, MacMahon and Koller’ reported that the frequency of 
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Fig. 2.—Summary of observed and predicted ratios of lung cancer rates derived by comparing 
population with respect to sex (cross), urban-rural residence (squares), race (triangles), and region 
circles After Haenszel, W., and Shimkin, M. B.: J. Nat. Cancer Inst. 16:1417, 1956.) 


leukemia among Jewish residents in Brooklyn, New York, was about twice 
that of the non-Jewish residents. They considered the possibility that the Jewish 
population may have been exposed to more irradiation, since there is evidence 
that they receive more medical care than other religious groups in Brooklyn. 
In 1956 a survey of the adult population of Buffalo, New York, was carried out 
with several objectives in mind, one of which was to determine the frequency of 
exposure to various forms of irradiation by various population characteristics, 
including religion. Despite the fact that the data from Brooklyn and Buffalo 
may not be comparable, it was considered desirable to see if exposure to irradia- 
tion differed in various religious groups. The results are presented in Table VI. 
Since the total number of Jews in the sample was relatively small and since a 
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majority of them were in the upper socioeconomic group, these comparisons 
were limited to this group. Clearly, for both sexes, the Jewish group had more 
diagnostic x-ray examinations and more x-ray therapy than the other religious 


groups. The differences were not large with respect to examinations, but thera- 
peutic procedures were almost twice as frequent among Jews as among non-Jews. 
These observations are clearly consistent with the variations in frequency of 
leukemia. Needless to say, they require confirmation, particularly in the same 
areas where information concerning frequency of leukemia is obtained. 
Consistency of the population distribution of a disease and of the possible 
causative factor obviously increases confidence in the plausibility of an etiological 
relationship between the two. In fact, the finding of complete consistency can be 
considered as being similar to ‘“‘replication’’ in experimentation. In this sense 


TABLE VI. AGE-ADJUSTED PERCENTAGES OF PERSONS WHO HAD ONE OR More DIAGNOSTIC X-RAY 

EXAMINATIONS DuRING 12 MONTHS PRECEDING INTERVIEW AND PERCENTAGES OF THOSE WHO HAD 

X-RAY OR RADIUM TREATMENTS DuRING LIFETIME, IN UPPER SOCIOECONOMIC GROUP, BY RELIGION 
AND SEx* 


X-RAY EXAMINATIONS X-RAY OR RADIUM TREATMENTS 


RELIGION 
NUMBER OF PER CENT NUMBER OF PER CENT 
RESPONDENTS EXAMINED RESPONDENTS PREATED 


Jewish Men 59 54.1 61 
Women 65 55.5 65 


Protestant Men 341 
Women 426 


Catholic Men 536 49.4 
Women 634 39.7 


*After Lilienfeld, A. M.: Pub. Health Rep. 74:29, 1959. 


we would interpret consistency as a replication of the relationship in various 
subgroups of the population, each of which differs from the others with respect to 


some characteristic. 


BASIS FOR DISEASE CONTROL MEASURES 


The usefulness of knowledge of disease distribution in a population as one of 
the bases for disease control measures is well known. However, it might be well 
to give a few examples, although simple ones. Data indicating that frequency of 
tuberculosis is very high in certain population groups, e.g., American Indians 
and Negroes, are of importance in providing a focal point for such public health 
measures as vaccination, chest x-ray surveys, etc. In the case of noninfectious 
diseases such as diabetes, knowledge that there is familial aggregation is impor- 
tant in indicating that, when a case is diagnosed, it would be worth while to screen 
family members of the patient as a means of early diagnosis. 
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TYPES OF DATA 


The above examples and the discussion in other papers in this symposium 
provide some idea of the various kinds of studies and resultant data that are 
utilized to obtain information concerning disease distribution. However, it 
may be well to briefly classify and review them for completeness of the present 
discussion. The types of studies can be classified as follows: 


A. Study of general population characteristics 
1. Vital statistics, including mortality data and reporting of morbidity 
2. Special population surveys 

B. Studies of individual characteristics 
1. Retrospective studies 


2. Prospective studies 


Studies of the distribution of disease according to general population character- 
istics are concerned with such matters as the time trend in mortality from a 
specific disease and the frequency of disease in various subgroups of the popula- 
tion (social class, industry, race, etc.). They attempt to determine if there are 
differences which may serve as a ‘‘lead’’——as a basis for more definitive types of 
studies. On the other hand, studies of individual characteristics are concerned 
with determining the relationship of characteristics of individuals with a disease 
as compared to those without the disease. Observed differences also would provide 
a lead for further study or might even uncover a specific etiological agent. Such 
studies can be of two types, retrospective or prospective. The retrospective study 
is concerned with comparing diseased and nondiseased individuals with respect 
to possible antecedent etiological factors, whereas the prospective approach 
starts with a group of individuals with the possible etiological factor and a group 
without such a factor. These groups are observed for a number of years to de- 
termine if they differ with respect to the frequency of the disease of interest. These 
studies are discussed in detail by Cornfield and Haenszel elsewhere in this 
symposium. 

These two general approaches are not necessarily mutually exclusive, 
in that a particular study might attempt to obtain information concerning both 
broad general characteristics and specific individual characteristics. It is helpful, 
however, to make such a distinction since it influences the type of biologic in- 


terpretation that could be made of an association between a disease and a char- 


acteristic. This will be discussed in more detail later. 


MEASUREMENT 


Although the methods of measuring and expressing the distribution of 
disease in a population are discussed in various texts and previous articles,>:® 
it might be well to present briefly and define the types of rates that are more 
commonly used. These rates have three essential elements: (1) a population group 
exposed to the risk of disease or death, (2) a time factor, and (3) the number of 
cases of disease or deaths occurring in the exposed population during or at a 
certain time period. For example: 


haste $ DISTRIBUTION OF DISEASE IN THE POPULATION 
Number of deaths from lung cancer per year 


Death rate from = $$ -_ * 1,000. 
lung cancer Number of persons exposed to risk of dying from lung cancer 


This particular death rate is expressed in terms of 1 year and 1,000 population. 
The unit of time can be whatever one chooses, but it should be specified by the 
investigator. Also, the population unit in which it is expressed (i.e., per 1,000 
or per 100,000) can vary but should be specified. 

Rates can be specific for various characteristics. Thus one could have 
age-specific or sex-specific death rates. For example: 

Number of all deaths under 10 years of 
Annual age-specific death rate from age during 1 year 
all causes for those under 10 years = 


of age Number of individuals under 10 years 
of age 


x 1,000. 


It is important to note that the numerator and denominator are related to each 
other in that the numerator represents those individuals to whom the specific 
event occurs and the denominator those to whom the event could occur, or to 
express it differently, those who are exposed to the risk of the event happening. 

For expressing morbidity, two general types of rates are available, the 
incidence and prevalence rates, which are defined as follows: 

Number of new cases of a disease that develop during a certain 
period of time 


Incidence rate = x 1,000. 


Number of persons exposed to risk of developing the disease 


Number of cases of disease existing at a specified time 


Prevalence rate = x 1,000. 


Average number of persons in population at a specified time 


Clearly, the incidence rate directly provides a measure of the probability of 
developing a disease which is the most desirable type of measure. However, 
prevalence rates are of importance since any differences in such rates may provide 
a lead for further, more detailed investigation. 


INTERPRETATIONS 


From our discussion and examples it is clear that, from the viewpoint of 
utilizing knowledge of the distribution of a disease in a population for determining 
etiological factors in disease, this type of information provides a series of statistical 
associations of disease with certain population characteristics from which we 
would like to derive biologic inferences. There is no need to point out that there 
are many difficulties in the interpretation of statistical associations based on 
observations of natural phenomena, some of which have been discussed by 
others.'®!7 Since many of the problems in this area are under current discussion by 
many investigators, we cannot consider the present discussion to be complete; all 
we can hope to do is indicate some of the problems. 
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In considering the interpretation of such associations, it is helpful to dis- 
tinguish those associations based on studies of general population characteristics 
from those based on studies of individual characteristics. For example, there has 
been an increasing death rate from lung cancer over the past several decades. 
During this same period of time the consumption of cigarettes has increased. 
Thus, there is an association of these two variables in time. However, this type 
of association may be an indirect one and a result of other factors whose frequency 
has increased over this same time period, such as the use of automobiles. Such 
an indirect association may have no biologic significance, and any biologic 
interpretation must be made with considerable caution. Similarly, if it is found 
that a certain disease is more frequent in the lower socioeconomic groups of the 
population than in the upper, it may be that there is no biologic relationship 
between the disease and socioeconomic status. The association may reflect either 
a biologic relationship with another factor that is more common in the lower 
social groups or the fact that individuals who have the disease drift downward 
to the lower social groups because of economic difficulties, such as inability to 
hold a skilled job or the cost of medical care. Despite these problems of inter- 
pretation, observations of such differences are important since they provide 
leads to possible biologic hypotheses which form the basis for more definitive 
types of investigation. 

On the other hand, an association determined as a result of studies of in- 
dividual characteristics has a greater likelihood of being a direct one. For example, 
in the case of lung cancer a larger percentage of patients with the disease are 
smokers than is found in a group of ‘‘controls.’’ Such an association is one determ- 
ined within individuals, and, clearly, the likelihood of this being a reflection of a 
true biologic relationship is greater than the association of time trends. The dis- 
tinction between these two types of association is important to bear in mind since 
the second kind of association has been criticized as if it were the more indirect 
first type of association. 

In discussing inferences to be derived from population studies it would be 
best to follow through sequentially the series of population studies that an in- 
vestigator may carry out as he attempts to develop etiological hypotheses. 
Let us assume that he develops a lead from an analysis of mortality or morbidity 
data. He then carries out a retrospective study of patients with the disease and 
a control group and determines that the association between disease and the 


presumed etiological factor is a more direct one. However, since the investigator 
is still uncertain whether the supposed etiological factor preceded or followed the 
disease state or even developed concomitantly with the disease, he carries out a 
prospective study in which he selects individuals with and without the possible 
etiological factor and studies them for a period of time. Let us further assume 


that the results of such a study confirm the relationship and also indicate that 
the disease followed the etiological factor. The investigator is then faced with the 
problem of determining more definitively that the association is direct and not 
indirect in the sense that both the disease and possible etiological factor reflect a 
common factor, i.e., perhaps another unknown factor results in both the disease 


and the presumed etiological factor. 
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To test this possibility, he has several courses open to him. He may carry 
out a randomized, well-controlled experiment in human population groups; 
this has been possible in only a few situations. Such experiments can be done in 
various ways. For example, in the case of the cigarette smoking—lung cancer 
relationship, one could theoretically select a sample of nonsmoking teen-agers and 
randomly allocate them to two groups. One group would be given cigarettes to 
smoke and the other would continue to remain nonsmokers. Both groups would 
be followed for a number of years to determine the frequency of occurrence of 
lung cancer. If lung cancer were more common in the smoking group, there would 
be no question concerning a causal interpretation of the association. Needless to 
say, the chances of doing such an experiment are small. 

In some situations experimentation is feasible. For example, with respect 
to the inverse relationship of fluorides in the water supply and dental caries, 
it was possible to take two similar communities, add fluorides to the water supply 
of one, and use the other community as a control. The children in these communi- 
ties were followed periodically and the frequency of dental caries was determined 
to be lower in the community to whose water supply fluorides were added.!* 
Such results provide the evidence necessary to indicate a causal relationship. 

If experimentation is not possible, the investigator may then determine 
whether the distribution of the disease in the population is consistent with the 
population distribution of the possible etiological factor. He attempts to determine 
whether the association is present under a great variety of circumstances. The 


presence of such consistency increases the confidence with which he may regard 
both the association and its interpretation as an etiological relationship. In 
addition, he may determine the characteristics of those individuals with the 


etiological factor as compared to those without, to see if he can unearth the 
possible common denominator which may be the alternative explanation of the 
association. Again, if these people do not differ with respect to many other 
biologically significant characteristics, his confidence in interpreting the statistical 
association as being indicative of an etiological relationship is increased. 

While he is carrying out these studies, he is constantly evaluating the degree 
and biologic reasonableness of the association. A very high degree of association of 
the disease with the presumed etiological factor and conformity of the association 
with existent biologic knowledge further increase his confidence. 

During the time when these field studies are being carried out, it may be 
possible to perform well-controlled animal experiments. Even though positive 
evidence resulting from such experiments does not necessarily allow general- 
ization to the human situation, these experiments provide a biologic model 
in which various details can be investigated and provide further suggestions 
for studies of human population groups. Again, positive evidence further increases 
the degree of confidence in an etiological hypothesis. At this point it might be 
well to point out that, in all of these studies, there are many serious sampling 
problems which have to be considered and evaluated at each stage of the process. 

In essence, the investigator constructs a chain of evidence until he reaches 
a desired degree of confidence in his interpretation. If he has been unable to carry 
out a well-controlled human experiment, there must always remain a certain 
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chance, although it may be small, that his inferences are erroneous. It is possible 
that the results of a study may be consistent with a biologic theory which may 
turn out to be false a century later. As has been discussed elsewhere, however, 
the investigator has to decide when the degree of plausibility in an etiological 
relationship has become sufficient for determining some course of action.‘% 

Population studies do not cease at this point. The investigator continues 
these studies to determine if the action taken has the expected result. If it does, 
the plausibility of the hypothesis is increased. For example, if a chemical carcino- 
gen were isolated and removed from tobacco, it would be of tremendous interest 
to observe the mortality from lung cancer during the next 25 to 50 years. With 
proper attention to other possible variables, changes in the time trend of lung 
cancer mortality would be of considerable interest. 

The foregoing is a somewhat oversimplified and generalized presentation 
of the sequence of events and trains of thought, but it may provide ideas as to 
how the various types of studies discussed fit into a general operational pattern. 
It should be emphasized that studies of disease in human populations do not 
exist 7” vacuo and do not merely provide ‘‘statistical’’ relationships. These 
studies interact with clinical and laboratory investigations, since each plays a 
role in the attempt to determine, as completely as possible, our knowledge of the 


nature of disease in human beings. 


POPULATION STUDIES AND THE NATURAL HISTORY OF DISEASE 


Our discussion has been concerned solely with the study of the distribution 
of the disease in the population, particularly with respect to determining etiologi- 
cal factors. There is one important aspect of studying a disease in a population 
that should be mentioned, although neither is it directly related to determining 
etiological factors nor does it directly involve population distribution. 

Both the clinican and the epidemiologist are interested in the natural history 
of a disease, which includes consideration of the spectrum of manifestations of the 
disease from the earliest determinable stage and the course of the disease. Cus- 
tomarily, knowledge of the natural history of disease is obtained by study of 
patients who are receiving medical care. It should be emphasized that information 
concerning the natural history of disease obtained through such means may not 
provide an unbiased and correct picture. One may readily perceive that patients 
receiving medical care may represent individuals with disease of a certain degree of 
severity. To obtain complete and unbiased knowledge of the natural history of a 
disease, however, it is necessary to have information on all individuals with the 
disease, which necessitates a population study. 

The best example of the importance of population studies in this regard is 
that of histoplasmosis, the history of which was recently reviewed.'® Up until the 
early 1940s histoplasmosis had been considered a rare tropical disease which 
was uniformly fatal. In the past two decades, observations made on various 


population groups radically altered our concept of the disease. The results of 


these studies indicated that histoplasmosis was essentially a fairly common disease 
in certain sections of the United States and was rarely fatal. The initial concept of 
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the disease was based on observations of the small proportion of infected in- 
dividuals who developed relatively severe manifestations of the disease and 


who represented the “visible portion of the iceberg,’’ whereas those infected 


individuals without clinical manifestations represented the ‘“‘submerged portion 
of the iceberg’’ and the vast majority of those with the disease. 

Our ability to determine the entire spectrum of disease will, in most in- 
stances, depend upon the availability of methods of detecting those diseased 
individuals who have no clinical manifestations. In many diseases of current 


public health and medical interest this is not yet possible. 
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N 1949, at the New York Academy of Sciences, there was held a conference on 
“The Place of Statistical Methods in Biological and Chemical Experimenta- 
tion.’’' Those who presented papers included workers in the pharmaceutic in- 
dustry, chemistry, and the Food and Drug Administration and two medical 
investigators who spoke about controlled clinical trials. Audience discussion 
showed familiarity with modern ideas of design and analysis, except for the 
questions asked by a doctor who revealed the lack of understanding, and con- 
sequent suspicion, of controlled trials which at that time prevailed in medicine. 
In 1958, at a meeting of the Endocrine Society in San Francisco, a panel 

of six cancer specialists was presented with case histories of patients who had 
mammary or prostatic cancer, and on each case the panel was asked to express 


opinions regarding appropriate therapies. Differences of opinion were, of course, 
considerable, and panel members voiced their belief that the only sound basis 
for opinion was a controlled trial, double blind if possible, with appropriate 
stratification, randomization of patients to the therapies under test, and proper 


analysis of the results. 


A STATISTICAL ATTITUDE 


The contrast between the 1949 conference and the 1958 panel-meeting 
illustrates a change that has become widespread in clinical medicine during the 
intervening years. The 1958 attitude is a ‘“‘statistical’’ attitude, whether we 
think of statistics as the principles that experimenters have used for centuries 
to obtain valid results in the face of omnipresent variation in the material studied, 
in methods of measurement, and in observers or whether we restrict the term, 
artificially, to some methods, such as randomization, which workers who bear 
the label “‘statistician”’ have recently introduced into experimentation. Whatever 
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we may call the concepts and methods, the striking feature is that now they 
are preached and practiced by an increasing number of clinical investigators, 
whereas a decade ago anyone who advocated them was commonly regarded as 
an aberrant specimen. A notable illustration of this change is the Report of the 
‘Symposium on Clinical Trials,’ held at the Royal Society of Medicine, London, 
in 1958.2 Only one of the nine participants who presented papers was by title a 
statistician; the others were physicians and surgeons. 

To one who in the 1920s had the good fortune to be introduced to the study 
of drugs by an experimental pharmacologist instead of by a teacher of materia 
medica, the spread of the experimental attitude and method from the laboratory 
to the clinic appears to be a natural development—something that would have 
occurred even if there had been no contemporaneous development, in other ap- 
plied sciences, of the body of principles and practice called ‘‘experimenters’ 
statistics.” 

The controlled clinical trial has, however, benefited from the participation 
of experimental statisticians—their familiarity with variation and methods to 
reduce it and measure it, their sensitiveness to sources of bias in biologic experi- 
ments, their knowledge of techniques to reduce such bias, and their introduction 
of strict randomization to permit reliable inferences in spite of the hidden biases 
that remain in all experiments. 

The growing understanding of these concepts and methods by clinical 
investigators is some consolation to those of us who are appalled at the widespread 
evil effects of statistical tests applied without real understanding in medical 
research. Whether laboratory and clinical investigators or statisticians, profes- 
sional and amateur, are more to blame for this evil is immaterial. We should not 
look backward but forward, to the possibility that an understanding of the role 
of statistical thought and action in clinical trials will reduce the perversion of 
statistics elsewhere in medical research. The first step is to realize that the method 
of controlled trials is still in its infancy; that, although the principles are simple, 
the art is extremely difficult; and that all of us, clinicians and statisticians, have 
a very great deal to learn about the art, especially in its most difficult form, the 


multiclinic trial. 


SOURCES OF GUIDANCE 


An art cannot be learned merely by reading about it, but reading is helpful. 
Excellent general guides to controlled trials have been written by pioneers such 
as Bradford Hill’: and Daniels,> and a summary of their writings may be useful 
as an introduction.* What we need now is not a repetition of these general intro- 


ductory statements but more details of controlled trials in various fields, as in 
the Report of the Royal Society of Medicine Symposium.’ Above all, we need 
discussions of difficulties*—a public sharing of experiences by those who have 


*A recent article by Greenberg,’ although written primarily for statisticians, is enlightening for 
anyone who contemplates participation in a multiclinic trial. A book® edited by Waife and Shapiro, 
which appeared after this article was submitted, can be most highly recommended for study not only 
by investigators but by practicing physicians who desire criteria by which they can decide how 
thoroughly a new therapeutic agent has been tested 
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met the difficulties, with suggestions regarding methods of avoiding or overcoming 
them. That is the purpose of this paper; and, although it will take illustrations 
largely from multiclinic trials, the principles and many of the details apply to 
single-clinic trials as well. 


MURPHY’S LAW 


A few years ago, when this department became involved in a ten-clinic 
trial, | thought it would be instructive to ourselves and to others if we made a 
list of all the difficulties that we met in the trial; but I soon discovered that the 
effort was unnecessary, because ‘‘Murphy’s Law’’ was found to operate. Ap- 
parently nobody knows who this particular Murphy was, but his law is quoted 
by theatrical producers when staging a play: “If something can go wrong, it 
will.”’ 

Instead of listing problems and difficulties, therefore, I shall attempt to an- 
swer the question, ‘‘What are some of the precautions that will prevent Murphy’s 
Law from operating quite so frequently as it commonly does in clinical trials?” 
No complete, or completely satisfactory, list of precautions can yet be given; 
but at least some general guidelines can be offered under the following heads: 
(1) choice of investigators, (2) time for planning, (3) realism in planning, (4) 
carrying out the plan, (5) practice, (6) permanence of investigators, (7) sample 
sizes and case loads, and (8) the policeman. 

After making this list, | was interested to see that many of its components 
had been mentioned or implied in the words of Daniels,® a physician who had early 
experience with cooperative trials. He wrote as follows: “Such trials are not 
easy to organize. They demand very careful planning; loose plans, loose methods 
give loose results, which are just as equivocal as the impressions of a single 
clinician and may be more misleading, since a semblance of scientific enquiry 
is presented. The trials demand close co-operation between clinicians and patho- 
logists of senior standing. They demand trust by these experts in the judgement 
of an independent person who has had no hand in the clinical treatment of the 
patients. From such work, no prestige is gained by any single individual. But, 
given all these things, the results abundantly justify the effort expended. Such 
methods give, within a year or two years, clear results in a field where unorganized 
clinical work might not reach a conclusion in less than ten years.”’ 


CHOICE OF INVESTIGATORS 


There are some people who may be very good investigators and who may 
possess very pleasing, friendly personalities, but who are not suitable to par- 
ticipate in multiclinic trials, or even in cooperative experiments in their own 
clinics. In fairness to his fellows, anyone who contemplates participating in a 
cooperative trial should first ask himself such questions as these: ‘‘Am I really 
interested in therapeutic testing, or is my real motive for joining the trial some- 
thing else, such as the opportunity to make ancillary observations on the subjects 
in the trial, e.g., biologic effects of newly synthesized compounds? If my pri- 
mary interest is not in the trial as such, shall I be willing to obey to the letter 
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the rules that my teammates and I have laid down (or have accepted) for the 
trial? If, during the trial, | think I see a promising lead that I would like to 
follow, will I let this impulse injure the trial in any way?” 

An investigator can easily injure a trial; but a trial can also injure an inves- 
tigator. Even a succession of single-clinic trials, running through several years, 
can deprive a research worker, at a crucial point in his career, of the opportunity 
to show his orginality as an independent investigator. 


TIME FOR PLANNING 


Daniels wrote of ‘loose’ plans. ‘‘Tight’’ plans take time to create, much 
more time than is dreamed of by those who have not previously taken part in a 
well-planned trial. Experience of this department and of other participants in 
ten- or twelve-clinic cancer chemotherapy trials revealed that, when the planners 
have the usual other commitments of clinic chiefs and have not previously 
organized a well-planned trial, time estimates of the following order should 
be made: 

(1) At lease a year should be allowed for drawing up the ‘‘ protocol” 
(the document containing the agreed plan) and the corresponding record 
sheets. 

(2) During this year, the planners (clinicians and one or more statisti- 
cians) should meet together for at least 100 hours, e.g., for an 8-hour day each 
month. 

(3) For every hour spent together, the planners must, individually, spend 
several hours (sometimes many hours) seeking answers for certain questions. 
This may entail a search for published or unpublished data or actual exploration 
on patients. For example, one member of the planning group may state that, 
for testing functional change in a patient with cancer (or rheumatoid arthritis), 
a certain scheme is satisfactory. The others may not have tried it or may have 
tried it and disapproved. The planning stage is often a far better time to test it 
than is the trial period itself. 

Single-clinic trials differ so greatly in type and complexity that a time 
estimate must be made for each trial individually. To do this, the planners 
should write down all the individual points on which a decision must be reached 
before the trial starts. They should then estimate the amount of time required 
to discuss each point (or to pursue the necessary inquiry), and finally they should 
multiply their estimate by a factor of two or more. This may sound like a facetious 
exaggeration, but to any laboratory investigator who has worked on projects as 
complicated as are clinical trials the advice will seem soberly realistic. 

As an illustration, there may be mentioned a preliminary memorandum 
recently sent to a group of rheumatologists who wish to explore the feasibility 
of multiclinic trials in rheumatoid arthritis. Of the 112 questions in the memo- 
randum, 105 deal with the type of patient to be admitted to the trial and the 
division of admitted types into subgroups (stratification). The numerous decisions 
to be made regarding methods of assessment, criteria of improvement, and pro- 
cedures throughout the trial were postponed for later memoranda. 
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REALISM IN PLANNING 


We have just asked for realistic time estimates; but hundreds of hours 
spent on planning are useless unless the planning itself is realistic. A group of 


planners may sit around a table in New York, New Orleans, or San Francisco 
or even in their own institution and imagine that they are visualizing adequately 
the conditions in their clinics, private practices, or hospitals; but when they 
actually get into the trial they commonly find that their vision around the con- 
ference table was not at all clear or detailed. Sometimes the chief of a service has 
not nearly such a detailed knowledge of difficulties as have some of his assistants. 
Here are three of the spots where realism, perhaps better called realistic imagi- 
nation, is essential. 

(1) Patient-Doctor Relationships.—All ten clinic chiefs in a planning com- 
mittee may say, quite honestly and ethically, ‘‘We don’t really know whether 
treatment x, although widely used, is any better than a placebo. Therefore we 
shall, in a double-blind trial, test x against a placebo.’’ But what will their clinical 
upbringing urge them to do when a patient with rheumatoid arthritis comes to 
them in tears and begs for something to help her? Fortunately there is always 
aspirin as a basic treatment in rheumatoid arthritis; but how will a cancer in- 
vestigator feel when what appears to be an ‘‘unduly large’’ number of patients 
are going downhill and he knows that the chances are 50:50 that any particular 
patient is on the placebo? Will he not feel that perhaps, in spite of his doubt, 
treatment x (or something else that he might try) might have some benefit? 
What will he feel like if the residents start to call the trial ‘the kiss-of-death 
experiment’’? 

Such eventualities must be guarded against by writing into the protocol 
a proviso such as: ‘‘When, in the opinion of the investigator, continuation in 
the trial may endanger the patient, the treatment shall be discontinued and the 
case shall be counted as a failure for that treatment.’ But this is only a partial 
solution of the problem, because if we know that half our patients are receiving a 
placebo, shall we not be more ready to infer failure than if we knew that all were 
on therapies that might benefit them? If so, we may, by taking patients off 
therapy, readily mask a real difference between x and placebo. 

If this department takes an unreasonably jaundiced view of placebos in 
cancer trials, it arises partly from an experience in one trial. One statistician with 
clinical training affirmed that he himself would not be inclined to compare x 
with placebo. Realizing, however, that he had overstepped his functional bound- 
ary as statistician, he accepted the clinicians’ decision in favor of a placebo, only 
to find, when the trial was under way, that his department was accused of in- 
sisting on the placebo! 

(2) Auxiliary Personnel.—The chief of a clinic may be a skillful, research- 
trained physician; but the success of a trial depends not on him alone, but on 
all who contribute to it—residents, interns, nurses, laboratory workers, medical 
secretaries, and others. Usually the training of such persons does not fit them, 
without special instruction and supervision, for participation in research. More- 
over, those of them who come into contact with patients are likely to think that 
to do something to a patient is necessarily better than to do nothing to him. 
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Some experts in rehabilitation realized that they had no real proof that 
many of their procedures benefited cerebrovascular accident patients and there- 
fore proposed to withhold those procedures from half the cases; but they found 
that they could not carry out their plan because their residents rose in arms 
against it. 

Nurses, especially, are apt to frown upon placebos. If they know that one is 
included in a trial and they think that a patient is going downhill they may 
speculate, aloud and in front of the patient, regarding the possibility that this 
is one of the placebo cases. 

A few years ago, when pediatricians began to suspect that oxygen admini- 
stered to premature babies was a possible cause of retrolental fibroplasia and 
consequent blindness, a very carefully designed trial was organized in which 
half the babies received the usual (liberal) amounts of oxygen and the other 
half received restricted amounts. At the outset the nurses were horrified at the 
idea that any babies should be deprived of liberal oxygen; but as the trial con- 
tinued and the suspicion that oxygen was dangerous increased in that clinic 
and elsewhere, the nurses changed sides completely and felt that it was criminal 
to give liberal oxygen to any babies.* 

Before entering a trial, every investigator should mentally check every link 
in his chain of personnel—his superiors (for hospital politics can play havoc 
with any research), himself, his medical associates and assistants, nurses, labora- 
tory and x-ray workers, secretaries, and others—and ask himself how to prevent 
each potential weakness. Do the nurses in this trial need to know that a placebo 
is being given, or different doses, or even that a trial is in progress? In a complex 
treatment, such as physiotherapy, can we reduce or omit one element at a time 
without incurring opposition? How can we most efficiently test laboratory ac- 
curacy by submitting duplicate specimens or specimens with a predetermined 
concentration of the substance to be assayed? 

(3) Work Load.—In one multiclinic trial in which the treatment period was 
4 weeks, it was decided that, after the pretreatment assessment, a reassessment 
by clinical and laboratory methods was to be made at the end of each week—five 
assessments in all. This decision stood without question during more than 12 
months of the planning period, until, at a meeting of all investigators, one of 
them (who had most strongly favored the five assessments) said that he had just 
calculated the number of man-hours required for these assessments and found 
that his clinic could not possibly meet the requirements. Others agreed; so the 
required assessments were reduced to two—one pretreatment, the other post- 
treatment. At the next meeting, several months later, one investigator said 


that he did not trust a single pair of observations; he wished to see a trend and 


proposed that the requirement for weekly assessments be reintroduced. The 


statistician, having recovered from shock, assured this investigator that he could 
make as many assessments as he wished and that they might give useful hints 
for further work, but that only the assessments that all investigators were able 


*I am indebted to my colleague, Dr. Jonathan T. Lanman, Associate Professor of Pediatrics, for 


this illustration. 


. ’ J. Chron. Dis. 
490) MAINLAND May, 1960 


to make (the lowest common denominator) could be used in the cooperative- 
trial analysis. 

The whole problem of work load must be considered realistically. Daniels® 
wrote of ‘‘cooperation between clinicians of senior standing’’; but the term ‘‘suit- 
able clinicians’? would perhaps be better. Suitability varies somewhat with the 
type of assessment; for example, if standard tests are to be used, a conscientious 
junior clinician is suitable, with the senior clinician participating less fully. The 
‘suitable’ clinician must be relieved of all duties that would interfere in any 


way with the trial. 


CARRYING OUT THE PLAN 


The Reading of Instructions.—‘‘It is no use giving them many instructions; 
they won’t read them.”’ This was the remark of a clinical member of a committee 


that was organizing a survey in which twenty-nine clinics were asked to record 
on data sheets certain pieces of information from their case records. 

It would ill become me to criticize anyone for hating to read instructions. 
Even when | assemble a new instrument my strongest impulse is to glance at the 
instructions and then proceed by trial and error; but I have learned that one 
“slight error’ or ‘‘slight omission’’ can wreck an instrument. Probably most of 
us, looking over data that we have recorded some time previously, have been 
dismayed by our errors of omission and commission; therefore we may picture 
the state of mind of a statistician who is at the receiving end of records from a 
dozen investigators and finds 30 per cent or more of the records defective in one 
way or other. He is not only appalled by the risk of bias (whether he discards 
or retains a record), by the raggedness and uncertainty of any analysis which, 
with vastly multiplied labor, he may perform, by the wastage of time and effort 
devoted to the study, and by the fruitless inconvenience or suffering of the patients 
who were used in it. Behind all these thoughts he sees a specterlike question: 
“Tf these errors are detectable, what other errors, quite undetectable, may lurk 
in these data?”’ 

The only way to reduce errors is to have eachrecord (and each copy of it) 
checked immediately after it is made, by a meticulous person who scrutinizes 
each word, figure, and statement and asks: ‘Is this clear, correct, and complete?”’ 

Chipping Bits From the Protocol.—I\n a certain clinical trial it was agreed 
by all investigators that a particular chemical determination was to be made on 
at least two blood samples, taken from the patient not less than 24 hours apart. 
After 30 or 40 patients had been used in the trial, one investigator reported that 
it was inconvenient and expensive for some of his patients to stay long enough 
in the city to fulfill this requirement. 

One answer to this difficulty would be: ‘‘ Recompense such patients from the 
research grant or contract.’’ But sponsoring agencies are naturally loath to allow 
such payments because they open the door to abuse—heavy costs of hospitaliza- 
tion and patient care illegitimately charged to research funds. The simplest 
answer is: ‘‘Do not use such cases at all.’’ But when suitable cases are scarce 
there is a great temptation to chip away bits from the protocol to accommodate 
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all sorts of special cases. This is dangerous enough when done openly with agree- 
ment of all investigators; it is even worse when no one knows except the investi- 
gator who does the chipping. Bradford Hill’ discussed this problem as follows: 
‘Every departure from the design of the experiment lowers its efficiency to some 
extent; too many departures may wholly nullify it. The individual may often 
think ‘it won’t matter if I do this (or don’t do that) just once’; he forgets that 
many other individuals may have the same idea.” 

Regarding the 24-hour interval problem, it is satisfactory to record that the 
committee of investigators decided to stick by the protocol, at least in public. 


PRACTICE 

Every experimenter knows how, in the first few of a series of complex experi- 
ments, he often runs into unexpected difficulties or makes unexpected mistakes 
even when he is familiar with the component techniques of the experiment. A 
clinical trial, even in one center and even if simple in design and observational 
method, is often a complex experiment because it usually involves teamwork; 
and I am coming to believe that, because of the initial difficulties and risks of 
hidden error, it would be best to make a rule that the first half dozen patients’ 
records from any clinic be automatically excluded in analyzing the results of a 
trial. Then I remember a trial that is now being planned in which it is said that 
no more than 6 suitable patients per year can be expected at each of twelve clinics, 
and a little arithmetic (6 — 6= 0) leads me to a more feasible solution: pretest 
the protocol, the record sheets, and personnel by a practice trial on a half dozen 
patients who will be somewhat similar to those who will be in the trial but not 
qualified for admission to the trial. This should be a complete, full-dress rehearsal, 
with all observations and tests made and recorded as in the actual trial. The 
fact that it is a practice trial should be withheld from as many persons as possible; 
and it will be still more effective if, instead of placebo-containing pills or capsules, 
some active but harmless drug can be introduced for several of the patients, on 
a double-blind scheme. If the test period in the real trial is to be several months, 
everyone is, of course, likely to be impatient in the practice trial; and, as a compro- 
mise, it may be possible to give some clinics the ‘‘green light’’ after only 4 to 


6 weeks of the practice trial. 


PERMANENCE OF INVESTIGATORS 

Here is the experience, probably not uncommon, of a statistician employed 
full time in a ten-clinic trial: At the outset, in some clinics a choppy sea—errors 
and omissions in records, delay in sending them in, misunderstanding of the 


protocol, requests for interpretation and guidance. Then, smooth sailing, followed 
in some clinics by another choppy sea. Why the second disturbance? Some of 
the chief investigators, who had been on the planning committee, had at this 
point handed the trial over to colleagues or assistants who had not shared the 
planning. In one clinic half the quota of cases for the trial had to be jettisoned, 
and a fresh start had to be made. In a cooperative trial, such a delay holds every- 


body back. 
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There is likely to be a succession of such shipwrecks if a trial is transferred 
to a nonpermanent assistant, such as a fellow-in-training, who, when he leaves, 


turns it over to another nonpermanent assistant. 


SAMPLE SIZES AND CASE LOADS 


‘‘How many cases shall we need?”’ That is one of the commonest questions 
asked nowadays by clinicians who are planning a clinical trial—another sign of 
increased statistical thinking. It is a vital question, but it cannot be answered 
unless the investigators can first answer four other questions: 

(1) What assurance do you require that you will not be fooled by chance 
differences in outcome between the A-treated and the B-treated groups? 

(2) What difference in outcome, due to difference in treatment, would you 
consider important clinically? 

(3) If such a difference really exists, what assurance do you desire that the 
trial will demonstrate the presence of an A-B difference in outcome? In other 
words, what assurance do you desire that your experiment will not be a failure 

a waste of time, of effort, of money, and, perhaps, of patients’ suffering? 

(4) What data have you on the outcome after either the A or the B treat- 
ment, e.g., percentage of improved cases or, if a measurement system is used, 
the average degree of improvement (and interpatient variation) ? 

If the answers to question 4 are little more than guesses, so also will be the 
estimates of sample size; but even without very dependable knowledge, the 
mere inspection of a table of required sample sizes (such as Table X of ‘‘Statistical 
Tables for Use With Binomial Samples’’’) can prevent us from rushing headlong 
into a trial with samples too small to detect anything but a gigantic difference 
a difference that would be obvious without spending thousands of dollars to 
detect it. 

Another lesson regarding sample sizes that many of us have learned the 


unpleasant way is: Beware of optimism regarding case loads. Having estimated 


from our impressions and even from inspection of past case records the numbers 
of patients, suitable for a certain trial, that we may expect during the period of 
the trial, the next step is to divide our estimate by at least two. This makes some 
allowance for fluctuations and for the restrictions of type of case that are usually 
necessary to reduce heterogeneity in a clinical trial; but even the divisor two 
would not have helped the clinician in a cancer therapy trial who estimated 
with confidence that he would have 20 suitable cases a year, and then secured 
only 3 cases during 18 months of the trial. 


THE POLICEMAN 


In every clinical trial there should be someone who is responsible for keeping 
‘coordi- 


the experiment running well. In multiclinic trials he may be called the 
‘coordinating 


‘ 


nator,’ or he may be the chairman or some other member of the 
committee.” 

Whatever the title, the duties must be those of a policeman, to prevent 
trouble, to catch it quickly when it starts, and to be always ready to give 
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immediate help when it is needed. This person should know the project thoroughly 
and the special features—difficulties, attitudes, and behavior—of the various 
participants. Looking at anything that occurs during the trial—for example, 
a departure from the protocol, an omitted or doubtful item in a set of observa- 
tions, a decision not covered by the protocol regarding any particular patient 
in the trial—the policeman should be able to say what effects this event will, 
or may, have on the conclusions drawn from the whole trial. The policeman’s 
duties often fall upon a statistician, mainly, it appears, for two reasons. 

The first reason is that a statistician, a practical or ‘‘applied’’ statistician, 
has by training and more importantly by his turn of mind and by his experience 
acquired a knack of seeing or sensing interrelationships between events A, B, 
and C under actual or possible conditions X, Y, and Z. Automatically his mind 
formulates the questions: ‘Will, or could, this event or circumstance in this 
context cause bias, so that we may wrongly attribute a difference in outcome to 
a difference in therapy or fail to detect a real difference due to therapy? Or will 
the event or circumstance increase the interpatient variation in outcome and 
thereby mask differences due to the different therapies? Or is there the double 
risk of bias and of increased interpatient variation? What can we do to protect 
the experiment against these risks?”’ 

Ordinary clinical training does not develop this knack or this attitude; 
neither does laboratory experience commonly or automatically fit a person for 
the perplexing world of clinical trials. And yet, the label ‘‘statistician’’ or even 
“‘biostatistician,’’ ‘‘biometrician,’’ or ‘‘medical statistician’’ is no guarantee of 
the presence of the attitude or the necessary experience. If such a label were a 
guarantee, how could we account for clinical trials designed with statisticians’ 
aid and called ‘‘doubleblind”’ although all bottles containing one drug were labeled 
“A” and all bottles containing the other drug (or placebo) were labeled ‘‘B’’? 

On the other hand, I have learned much both by reading and discussion 
from clinicians who have displayed a ‘‘Sherlock Holmesian”’ knack of ferreting 
out weaknesses in experiment design, risks of bias, and undesirable variability; 
and I hope to see the day when more of them take on the policeman’s duty in 
clinical trials. Not only have they knowledge of minutiae, including personality 
crosscurrents in clinical situations, which a statistician without medical training 
takes a long time to acquire, but they are of superior ‘‘pecking order,’’ even 
superior to that of a doctor who has acquired the label ‘‘statistician.”’ 

The second and perhaps the chief reason why police duties often fall on 
statisticians rather than on clinicians is the time required to carry out the duties 
efficiently. Even in single-clinic trials the time required, free from other pressing 
problems, for study of details, for thought instead of snap judgment, and for 
constant contact with the work in progress is much greater than is realized by 
many clinicians who invite a statistician to help them. In a multiclinic trial | 
believe that, at this stage in development of the method, the affairs of the trial 
should be the primary, and sometimes the sole, duty of one policeman. For 


example, this department is the statistical center for a ten-clinic trial in which a 


succession of two-drug comparisons are made, with evaluation at the end of 4 
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weeks. Fewer than 100 patients are under study at any one time, but an exper- 
ienced biometrician is occupied full time on the project, along with a full-time 
assistant for tabulation and secretarial work; and in the background is a senior 
statistician for consultation and sometimes for consolation.* 

With what do these two full-time statisticians occupy their time, besides 
preparing and dispatching record sheets, distributing random-allocation schemes, 
receiving records, tabulating and analyzing results? The answer is that Murphy’s 
Law takes care of their remaining hours. Here are four examples: 

(1) A patient cannot be included in the study unless a biopsy specimen, 
taken before treatment, has been certified as definite cancer by the pathologist 
appointed for the purpose. A certain patient has completed his course of 
therapy, but the biopsy report has not arrived. How many times should the 
biometrician write, or telephone long distance, before removing the case from 
the series? 

(2) Final assessment of each patient is to be made at the end of the fourth 
week of therapy. Three reports from the State of Iowa in a drug-placebo trial 
did not arrive at the expected time, and the biometrician was informed that the 
assessments could not be made until perhaps 2 to 4 weeks after the due date and 
termination of therapy, because the men had gone out harvesting. 

The biometrician called the consultant statistician, who, being an east- 
coast city dweller, reasoned as follows. To qualify for the trial, all patients have 
to show actively progressing cancer. Therefore, these 3 patients must, at least, 
have been feeling better than at the beginning of therapy. Then the consultant 
statistician was informed that in lowa, unless a man is dead and buried, he may 
go into the fields when the harvest season arrives. 

Anyone who would like the policeman’s job in a clinical trial could, therefore, 
test himself on the following questions: What might be the effects on the con- 
clusions from this trial of (a) automatically excluding these 3 patients from the 


analysis, (b) including them as “improved,” (c) including them as ‘‘not improved”’ 


or ‘‘worse,’’ (d) deciding to exclude or include by tossing a coin for each patient, 
(e) including them by the assessment made after a month’s delay? How would 
the answers to these questions be affected if (a) all 3 patients were on placebo, 
(b) all 3 were on the drug, (c) 2 were on placebo and 1 was on the drug, (d) 1 
was on placebo and 2 were on the drug? How would the choice of decisions be 
limited if the assessments are not in terms of ‘‘improved”’ and “not improved”’ 
but by increase or decrease in a certain blood chemistry value? Since the sowing 
season in Iowa is as good a corpse-reviver as the harvest season, would it be de- 
sirable to limit the drug testing to men who are not farmers or to conduct the 
trials at other seasons of the year? (If a candidate for police duty is dismayed by 
these questions, he may be consoled by the statement that 30 years’ battling 
with bias and variability do not ensure an easy or unequivocal victory in every 
combat. 

*Clinicians’ Jack of experience, inclination, and time are not the only reasons why statisticians are 


apt to find themselves on police duty. A medical school can hire an experienced statistician for less 
than half the salary of a clinician who would be suitable for the job. 
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(3) To ensure the double blindness of the trial, each bottle containing placebo 
or drug is labeled at the drug distributing center with a code number supplied 
by the biometrician. On one occasion the medical secretary, who was in charge 
of the drugs at a certain clinic, telephoned the statistical center to say that, 
although the bottles bore the code numbers, the placebo-containing bottles 
were in boxes labeled ‘‘placebo,”’ while the drug-containing bottles were in boxes 
labeled with the name of the drug. The secretary had quickly hidden the evi- 
dence, but she wondered if this had been done at other clinics also. Immediate 
calls from the statistical center to the other nine clinics between the Atlantic and 
the Pacific prevented serious harm. 

(4) ‘I am not sure that the patient took the medicine.’’ Such a phrase, met 
from time to time on case reports, touches on an extremely difficult problem, 
especially when the subjects of a trial are outpatients. Some planning groups seem 
surprisingly unconcerned about it, although it is known that in certain socio- 
economic-racial classes the whole family may share the patient’s medication. 
Until the problem is solved (if it ever can be completely solved), it would be wise 
for us to get into the habit of using such phrases as, ‘‘A comparison of outcome 
after prescription [not administration] of drugs A and B.”’ 

A clinical trial policeman would do well to recall that the Gilbert and Sullivan 


‘ 


‘not an ‘appy one.”’ In everyday life, most of us believe in 


policeman’s lot was 
the police system and expect a policeman to give us help and protection. But 


when he gives us a ticket, however justly, he becomes a ‘‘cop,’’ and the word 
reveals our feelings. 

The only way by which a clinical trial policeman can reduce the risk of 
such resentment is by personal contact with investigators, in order to develop 
confidence and mutual understanding; and in multiclinic trials that means 
much traveling for the policeman or the investigators or both. 


PROSPECTS 


The foregoing remarks have not covered all the kinds of problems that 
arise in clinical trials or all the advice that might be offered; but they may have 
already discouraged the reader as much as they have the writer himself. Perhaps 
the best advice for both the reader and the writer is to turn again to the first 
section of this article and to remember that an infant does not appear to be a 
very efficient organism. 
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“\SURING the past 50 years much of the world has experienced a dramatic 
change in its pattern of illness. By far the greatest part of this development 

has been the conquering of a wide spectrum of infectious diseases, and the 
key to this conquest has been increased understanding of the causative 
organisms. The earliest advances were dependent on attacking the organism 
outside the host, by general and specific sanitary measures. This was quickly 
followed by specific protection of the host by immunizations. Only relatively 


recently, highly effective methods of treating the established disease in the affected 


patient have been achieved. 

With these developments firmly established as either past history or current 
practice, a new set of challenges predominates. The chief contenders are now the 
chronic degenerative diseases and malignancy. At present, control is far from 
satisfactory, whether related to treatment of established disease, protecting the 
individual, or attacking the etiological agent. 

Although lack of basic knowledge of the processes involved in these diseases 
confounds the problem, it is widely felt that better application of presently 
available knowledge has great benefits to offer. In particular, it is felt that 
presently available therapy if applied at an earlier stage would greatly improve 
the picture. In line with this concept, preventive medicine has changed emphasis 
from activities with points of attack during prepathogenesis to early case-finding 
programs. 

The prominent place of early case-finding in current thinking in preventive 
medicine is indicated by the following policy statement of the Health Insurance 
Plan of Greater New York: “The cornerstone of preventive medicine in group 
medical practice is the periodic health examination. The term ‘periodic health 
examination’ is used here to include all preventive health services to apparently 
well individuals... at regularly stated intervals for the purpose of early detection 
of disease.’”! 

Numerous reports have appeared urging increased early case-finding effort. 
These reports generally take the form of enumeration of conditions found or of 


review of follow-up.?-* 
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Other reports express some doubt as to the value of these findings in disease 
prevention. The editor of the New England Journal of Medicine writes, ‘The 
case for early detection . . . is not completely open and shut. .. . The problem that 
confronts the physicians, patients and public alike is to decide how much value it 
has.’”® Crile® expresses a similar view: “I believe that the public and perhaps 
even the profession itself have been somewhat oversold on the importance of time 
in most diseases. .. . There are few diseases .. . in which it makes any difference 
to the ultimate results whether the condition is recognized today or two weeks 
from today.’ A stronger comment is referred to in an editorial in the Lancet: 
“|. several distinguished physicians declared that routine examinations of this 
kind are virtually useless for their intended purpose—namely, the detection and 
successful treatment of symptomless disease.”"’ A somewhat more cautious 
statement suggests that the matter is not yet decided and that perhaps early 
case-finding is important in some but not all diseases. ‘‘We still do not know 
whether case-finding and treatment in the presymptomatic stages of disease either 
diminish disability or postpone death for the majority of chronic diseases. .. . It 
may be that for some diseases, the diagnosis and treatment of early symptomatic 
disease will control such disease as effectively as its discovery and treatment in 
the presymptomatic stage.’’® 

These conflicting views suggest the necessity of a careful evaluation of these 
programs. It is the purpose of this paper to describe the application of the prin- 
ciples of evaluation to preventive services. A generalized evaluation model 


applicable to all phases of preventive medicine will first be outlined. The specific 
application of the model in evaluating early case-finding programs will then be 


presented. 


THE SCOPE OF PREVENTIVE MEDICINE 

The term ‘preventive medicine” has been used in the previous section in 
connection with activities which prevent the initial occurrence of disease as well 
as with activities which prevent the progression of disease. If preventiveness 
is considered as a scalar quantity, it seems reasonable to speak of preventing 
the initial occurrence as being ‘‘more preventive” than preventing the progression 
of disease. 

With this comparative concept in mind, all aspects of medical practice can 
be arranged in an ordered sequence, the position in the sequence representing 
the degree of preventiveness. This degree of preventiveness is defined by the 
extent to which the activity alters favorably the natural course of disease. The 
ultimate of this scale would be activities which completely eradicate the etiological 
agent of a disease, as, for example, malaria or yellow fever control programs. 
Specific vaccination would have only a slightly lower rating. Early treatment 
would in general rate higher on this scale than late treatment. As will be discussed 
later, however, this need not be so in every case, and both the natural course 
of a disease and the available therapy must be considered in determining the 
preventiveness of any program. Purely symptomatic or palliative therapy, 
although an important part of medical practice, has only minimal effect in 


changing the course of disease. 
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Following this line of thought, ‘“‘preventive medicine”’ is understood to refer 
to a philosophy of medical practice which emphasizes methods which attack 
the natural course of a disease. This philosophy does not exclude any type of 
medical service but attempts to incorporate all appropriate services in coordinated 
programs which are effective in altering the course of diseases. 

The term “preventive services’ is generally used to refer to those services 
which are highest on this preventiveness scale, and these are the services of 
particular interest in this paper. However, the general outline for evaluation 
which follows is applicable to all medical services. The title of this paper, ‘“‘Eval- 
uation of Preventive Services,”’ will then be understood to mean evaluation of 
the preventiveness of medical services. 


EVALUATION—GENERAL MODEL 


A program evaluation, as the term will be used here, is a very special sort 
of research investigation and should be distinguished from demonstrations and 
program reviews. The latter two sorts of studies are similar in general purpose 
to the evaluation but less rigorous in design. 

An evaluation answers one or more of the following questions with respect 
to an experiment: 


(1) Does the program, as operating during the period of evaluation, ac- 
complish the objectives set down prior to this period? 

(2) To what degree does the program accomplish these objectives? 

(3) What amount of effort is required by the program to achieve the ob- 
jectives? 

The steps basic to answering this set of questions are: 

(1) Statement of objectives. Objectives are laid down before beginning the 
phase of the program to be evaluated. The statement of objectives includes a 
description of the measurements by which to gauge success. 

(2) Program. The program is a procedure which is intended to alter existing 
behavior in such a way as to accomplish the stated objectives. The phase of the 


program being evaluated is designed specifically for the objectives and is hoped 
to be the best available method of accomplishing the objectives. 
(3) Control. The control answers the associated implied question, ‘‘Would 
the objectives have been equally accomplished in the absence of the program?” 
(4) Measurement of effect. The effect is measured with relation to the stated 
objectives and in terms of the initially prescribed gauge. 


APPLICATION OF THE MODEL 


The four steps of the model as applied to evaluating the preventiveness of 
medical services follow readily from the preceding discussion: 

(1) The objective is to alter the natural course of disease in a favorable 
direction. 

(2) The program is a medical service or coordinated group of services and 
is intended to be the best method, with current knowledge and available facilities, 
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of accomplishing the objective. In the later sections the program discussed as 
illustrative of the method will be one of the early case-finding programs. 


(3) The control is the natural course of disease as it exists with the normally 


available medical care of the community. 
(4) The measure of effect is the extent to which the natural sequence of 


pathogenic events is favorably altered by the program. 


NATURAL HISTORY OF DISEASE 


It will be apparent that an understanding of the natural history of disease 
is basic to the evaluation. The natural history of disease is used here to refer 
not only to the interrelation between the external etiological agents and the 
biologic response of the host but also to the effects of environmental factors, 
social and physical, to the community pattern of medical practice, and to the 
social and intellectual response of the host. Thus the natural history of chronic 
pulmonary insufficiency may be quite different in certain urban areas with 
excessive industrial air pollution as compared with the course in relatively 
nonpolluted rural areas. Similarly, the natural history of disease in a highly 
literate and medically sophisticated area may be different from the sequence 
observed in a remote region with generally poor medical facilities and little 
education as to health principles. 

These factors will be important in evaluations since the programs likely 
to be effective in altering the natural course will clearly be dependent on the 
characteristics of this course. In particular, some very simple programs may be 
highly profitable in the remote region mentioned above, but the same program 
might give no additional benefit to the natural course of events in the more 
advanced area. 

In the following paragraphs a schematic representation of the natural 
course of disease is presented. This representation is adapted to the considera- 
tions of early case-finding programs. These programs are potentially useful only 
after the disease has begun and before it has been diagnosed in the natural course 
of events. An evaluation of other preventive activities, applicable in the pre- 
pathogenic period or in the period following diagnosis, would require a somewhat 
different scheme, emphasizing different portions of the disease course. 

In Table I this natural sequence is indicated diagrammatically. A hypothetic 
disease as it affects an average individual in a specific environment is represented. 
Lengths of intervals along the sequence from A to D are proportional to the times 
between the indicated events. At the time of study, the most sensitive known 
detection device could detect this disease only after it had been present for a time 
AB. This average individual and his physician would not apply the detection 
test at B, so the diagnosis would not be known at this time. After an additional 
delay (BP), the disease would undergo some recognizable pathologic change. This 
might be the onset of symptoms, a change in some organ detectable on physical 
examination, or a change in some laboratory study. The event at point P would 
also be overlooked or misunderstood, as a result of either patient or physician 


delay, and the diagnosis would be made only at point C. 
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TABLE I. DIAGRAMMATIC REPRESENTATION OF NATURAL HIsTORY OF DISEASE 


CRITICAL 
POINT 


Usual time of Final 


Biologic Positive detection Pathologic 
diagnosis outcome 


onset test possible change 


Now let us suppose that at point X in this time sequence a critical event 
occurs in this disease. By a critical event it is meant that therapy instituted before 
this point is less difficult or more effective than therapy instituted after the 
point. More specifically, if the disease is curable, cure may be effected before 
the critical point but will be more difficult or impossible after the point. In the 
case of a self-limiting disease whose chief ill effect is a permanent deformity or 
other stigma, this permanent effect may be prevented by therapy before the 
critical point but is inevitable after this time. In the case of a chronic disease 
with no known cure, therapy before the critical point delays the progress of 
disease and postpones or lessens the severity of successive stages. Therapy begun 
after the critical point offers no advantage over the normal course, in which 
therapy is begun after point C, the time when the disease is diagnosed in its 
natural course. 

The last example in the preceding paragraph indicates that the critical 
point is defined with relation to point C in the natural history of disease. It 
will be apparent that there may be an additional critical point (X’) defined 
with relation to point X and falling to the left of point X (Table II). Therapy 
begun before X’ would result in a more favorable sequence than therapy begun 
at point XY, but therapy begun after X’ and before X would offer no advantage 
over therapy begun at point X itself. Similarly there may be arbitrarily many 
critical points, each critical with respect to the next critical point to the right 
or to point C. 

These points may correspond to recognizable pathologic changes as defined 
at point P (Table I), although this would not generally be the case. 

The critical points together with point C divide the natural history of disease 
into a discrete set of segments. The concept of relative potency of preventive 
activities discussed under the previous section, ‘‘The Scope of Preventive 


TABLE IJ. MuLtieLe CriticAL Points IN NATURAL History OF DISEASE 


Usual time of 


Positive detection 
diagnosis 


test possible 
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Medicine’? may be defined in terms of these segments. Thus activities which 
cause therapy to be started during any time segment left of point X are effective 
preventive measures, and the farther the segment is to the left the more po- 
tent is the prevention. Within any segment, however, no point has any advan- 
tage over any other point. 

A limiting case is that in which there are not discrete but continuous critical 
points. In such a disease, at every moment there is advantage to the future 
sequence of pathogenic events to begin therapy immediately. 

Employing this schematic concept of natural history of disease we can 
state certain conditions which must be fulfilled if an early case-finding program 
is to be a successful preventive measure for a given disease. 

(1) There must be a known effective therapy. 

(2) There must be a diagnostic device capable of detecting the disease before 
its usual time of diagnosis in the community being studied. 

(3) There must be one or more critical points, as defined above, in the natural 
history of the disease. 

(4) Such a critical point must occur in the time sequence of the disease 
after the time when diagnosis first becomes possible and before the time when 
diagnosis is made under the usual disease pattern of the community. 

In addition to these basic requirements some quantitative estimates can 
be made from the natural history diagram as to the probability of success of an 
early case-finding effort for a given disease. 

(1) The longer the time between points B and C in Table I, the greater is the 
likelihood of diagnosing cases early by instituting a case-finding program. As an 
example, if some screening test is capable of detecting a disease more than a year 
before the time it is normally discovered, annual applications of the test will 
routinely discover disease earlier than usual. If, however, the test becomes 
positive only 3 months before symptoms would lead to its diagnosis, upon apply- 
ing the test annually three-fourths of the cases will still be diagnosed as a result 


of symptoms which develop within a year following a negative test and only one- 


fourth will benefit by the screening. Here results could be improved by screening 
at more frequent intervals. 

(2) The longer the time interval between points B and X” in Table II, 
the greater likelihood there is of achieving the maximum benefit possible from 
early case-finding approaches. The probability of achieving any prevention at 
all is proportional to the length of the interval BX. This probability is less than 
the probability of finding cases early, dependent on BC; that is, the number of 
cases found earlier than usual will in general be more than the number whose 
prognosis is benefited, since some may be found earlier than usual but still not 
early enough to be helped. 

It should be noted again that the natural history events described are 
dependent on the environmental setting, including the generally available 
medical care, as well as on the biologic characteristics of the disease. In a specified 
community a new case-finding program may be unprofitable only because an 
equivalent procedure is already a generally accepted part of the community’s 
natural disease course. The time of usual diagnosis, the sensitivity of screening 
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tests, and the effectiveness of therapy will all vary with education of the popula- 
tion, altertness of physicians, and scientific developments in medicine. 

The following conditions illustrate some of the patterns in which critical 
points may occur. 

(1) Presbyopia is a disease in which there are no critical points, not because 
of lack of screening tests or lack of therapy, but because presently available 
therapy alleviates symptoms only and does not alter the course of the condition.'” 

(2) Carcinoma of the cervix is a disease in which there appears to be a single 
critical point occurring after the development of a positive screening test. The 
critical point occurs at the time when a distant metastasis becomes viable or 
when malignancy becomes inoperable as a result of local extension.'! 

(3) Carcinoma of the breast has proved particularly disappointing from the 
preventive medicine point of view. The mortality rate for carcinoma of this site in 
New York State remained virtually unchanged from 1931 to 1956 despite exten- 
sive efforts to educate the public as well as physicians to be alert to all suggestive 
lesions.’ This and other similar observations have led some to the conclusion 
that certain malignant conditions are predetermined in their outcome and are 
not subject to any benefit from early case-finding efforts. In terms of the natural 
history diagram this would be described by considering that metastasis in breast 
cancer occurs either before point B or after point C. Early case-finding would 
be too late in the former type, since the disease is diffuse before any known 
test becomes positive. In the latter type, early case-finding would be unnecessary 
because diagnosis would be made soon enough without any special effort. This 
result might hold true in areas with a good general level of health habits, while 
in a less alert area a preventive program might be indicated. 

(4) Diabetes mellitus may be a disease in which progression of irreversible 
vascular change is continuous so that every improvement in time of starting 
therapy is beneficial in slowing this natural disease sequence. This would be 
represented by a continuous succession of critical points rather than a discrete 
set of such points.” 


ANALYTIC STUDY OF NATURAL HISTORY 


In order to make measurements of some of the intervals in the natural 
history of disease, it is first necessary to determine stages of disease which are 
identifiable in individual patients. 

Identification of Stages of Disease-——The interval from biologic onset to 
development of the first diagnostic indication (AB in Table I) is in most diseases 
a purely speculative time period. In some diseases its duration can be determined 
by identifying contact with an etiological agent. A known congenital disease, 
for example, Huntington’s chorea, can be said to have had its biologic onset at 
time of conception. 

Individuals with disease in the stage represented by the interval BC (from 


development of a positive test until time of usual diagnosis) can be identified 


by a screening survey of previously undiagnosed people. Similarly, individuals 
in the interval CD (from diagnosis to final outcome) can be identified from the 
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medical history, and it is this interval that is most frequently discussed in disease 
descriptions. 

The various critical points, X¥, X’, X”, etc., are in general not readily identi- 
fied. Only when they coincide with some recognizable pathologic change is 
identification possible. Thus a critical point in most malignant conditions is 
identical with development of a metastasis. In the more common situation, when 
a critical point is not clearly related to an observable pathologic change, it is 
possible only to determine whether one or more critical points exist between two 
successive pathologic changes. As an example of this, it may be demonstrated 
in diabetes mellitus that treatment begun at the time of development of a positive 
glucose tolerance test is more effective in delaying late vascular change and death 
than treatment begun at the first appearance of a retinal vascular change. 
Therefore, there is at least one critical point between these two identifiable 
events in the disease course. We cannot say, however, whether a given patient 
has passed this point or not. 

It will be apparent from the preceding paragraphs that it is of basic impor- 
tance in describing natural history of diseases to have clearly defined stages 
of disease which can be identified by examination of individual patients. An 
example of such a staging is the now familiar use of retinal vessel changes for 
grading of hypertensive vascular disease. It is urged that similar stages be 
developed for all of the common chronic diseases as a first step in describing their 
natural histories. The more detailed a staging is developed, the more precisely 
can critical points be defined. 

Determination of Duration of Stages of Disease.—Ywo general methods for 
measuring the durations of identifiable stages of disease are commonly used. 
Either of the two methods may be used for any of these stages. 

The one method follows a population longitudinally from the time one 
recognizable pathologic feature appears until a second feature appears. An 


example of this method is the recent study in Worcester, Massachusetts, of 


survival time following the first attack of cerebral thrombosis." In this instance 
the second pathologic feature is death. 

In the second method, the relationship: duration = prevalence, incidence 
is used. This approach generally involves a cross-sectional study to obtain 
prevalence and also reporting of new cases over some period of time to determine 
incidence. Dunn! has used this method for estimating the survival time of 
invasive uterine cancer. 

The indications and contraindications for the two types of analysis will 
not be discussed here. Each involves certain assumptions which must be carefully 
considered when the findings of the study are to be generalized beyond the partic- 
ular populations investigated. Either may be used when the natural history alone 
is to be investigated. In the type of problem of greatest concern in this paper, 
however, the chief interest is in a comparison of the natural history of disease 
with the course observed after introducing a new technique, namely, early 
case-finding efforts. 

If the longitudinal study is used in connection with cases identified by a 
new technique, the results cannot strictly be considered to represent the natural 
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history because of the effect of the study on the disease. However, with certain 
precautions the effect of detecting the cases early can be minimized by allowing 
all or a sample of cases to go untreated until the stage of disease at which diagnosis 
is normally established. This involves certain ethical problems but may be a 
feasible solution if there is no generally accepted opinion that early treatment 
of the disease in question is of known value. As an example, there is difference 
of opinion at the present time as to whether certain findings represent a precursor 
stage of classic diabetes. It may be acceptable to observe certain patients with 
these findings without recommending therapy until a diagnosis of classic diabetes 
can be made. 

If the cross-sectional approach is applied to this same question, the average 
duration can be computed from prevalence and incidence data. The effect of 
the cross-sectional study itself on the course of the disease is relatively little as 
compared with the longitudinal approach. 

Error of Interpretation.—An error occasionally made in interpreting studies 
of the course of disease is to confuse the additional time of observation of the 
disease resulting from early case-finding with an added increment of life due to 
improved treatment. This fallacy is illustrated in Table III. In this table, two 
pathologic events are indicated—P, a recognizable clinical finding, and D, death. 
In the usual pattern of medical events diagnosis is made at C and death occurs at 
D. If an early case-finding program is applied, diagnosis is made at C’. Death is 
delayed to D’ asa result of starting therapy early. The benefit due to early therapy 
is represented by the increment DD’, and this can be measured by taking the 
difference between the two survival times (PD and PD’) following clinical 
finding P. The common error is to compare the survival times (CD and C’D’) 
from times of diagnosis C and C’, respectively. This comparison clearly exag- 
gerates the benefit of early case-finding, since the interval CC’ is not an added 
length of life but simply an added time of recognized disease. 


TABLE III. Errect or EARLY CASE-FINDING ON SURVIVAL TIME 


Early diagnosis Pathologic change Usual diagnosis Usual death Delayed death 


LEVEL OF OBJECTIVES 

In the previous sections it has been stated that the common objective of 
preventive services is to interfere with the pathologic events in the natural 
history of disease. It is frequently useful to evaluate a program in terms of 
some more limited objective. As an example, it would seem reasonable to evaluate 
an early case-finding program by determining the frequency of finding cases 
early. Studies of intermediate objectives are, as a rule, far easier to carry out 
than studies of ultimate objectives. The information obtained is of correspondingly 
more limited value. The shortcut is justified only when the relationship between 
the intermediate and ultimate objectives has already been established or when 
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it is anticipated that it will be established in a subsequent study. In an early 
case-finding effort it may have been clearly shown that finding cases of a specific 
disease in a given population leads to decreased mortality. It is then adequate 
to accept this information in evaluating various means of finding disease in the 
population and to confine these evaluations simply to determining appropriate 
case-finding rates. 

It should be remembered that the original relationship between intermediate 
and ultimate objectives may change in time. Thus if the population and medical 
personnel become more alert to symptoms and signs over a period of time, they 
may improve the normal community level of early case-finding to such an extent 
that available diagnostic screening tests offer no additional value. Conversely, 
the development of a new case-finding test may make early detection efforts 
profitable in an area where they were previously unrewarding. 


Levels of objectives form an extensive spectrum. The simplest level is the 
quantity of money spent. This may be the extent of evaluation required in 


certain administrative reports. More commonly it is required to show what 
services were performed, a second level of objective. Beyond this a third level 
would include some measure of the accomplishments of the service, such as 
numbers of patients contacted, numbers of positive cases discovered, or volume 
of treatment given. 

All of the measures in the above paragraph may be considered intermediate 
in the sense that they do not measure the extent to which the natural history 
of disease is altered. This latter measure is the first level objective which is 
intrinsically valuable. 

Higher and more complex levels exist in this spectrum. Thus, Baumgartner™ 
has mentioned that in the Soviet Union health is considered a means rather than 
an end. A successful program would in that setting have to be measured in values 
to the state rather than the individual, as, for example, units of production output. 
Beyond this, theologians or philosophers could add more complex objectives 
of health services. 

It is, then, to a certain extent arbitrary that improvement of health is 
accepted as an ultimate objective. It is felt that this is an objective that the 
medical profession generally accepts as intrinsically valuable, and lower level 
objectives are generally not acceptable, except insofar as they are felt to be 
indicators of health benefit. 


SUMMARY 


A broad concept of preventive medicine has been described, embracing 
all phases of medical care. All medical services are preventive services to the 
extent that they are concerned with altering favorably the natural history of 
disease. 

An evaluation model is described which is designed to answer the following 
questions with relation to health programs: (1) Does the program meet its ob- 
jectives, or (2) to what degree does it meet the objectives, or (3) how efficiently 
are the objectives met? 
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A spectrum of intermediate and ultimate objectives is described. The ultimate 


objective of particular interest in preventive medicine is that objective which 
corresponds with the concept of prevention as an alteration of natural history 


of disease in a favorable direction. In order to measure this objective it is neces- 
sary to develop an analytic description of natural history of disease. A diagram- 
matic representation of such a description is presented. 

It is urged that specific diseases be extensively studied to develop clearly 
definable phases of which reliable analytic description can be made. 

The logical fallacy of confusing time of diagnosis with a pathologic change 
in disease history is discussed. It is pointed out that prolonged survival time 
following early diagnosis efforts may be due to two increments—one represents 
the additional portion of the early natural history brought under observation 
and the second represents the increased years of life attributable to beginning 
therapy at an advantageous time. It is necessary to distinguish these increments 
in evaluating preventive services. 

It is urged that an evaluation not be accepted as complete until the corre- 
spondence of intermediate with ultimate objectives is established. In the case of 
early diagnosis programs, this requirement demands proof of the preventive 
medicine maxim, ‘‘The earlier the treatment the more effective the prevention.”’ 
In order to predict benefit from early case-finding, a disease must have the follow- 
ing characteristics in the population in which it is being studied: 

(1) There must be a known effective therapy. 

(2) There must be a diagnostic device capable of detecting the disease prior 
to its usual time of diagnosis in the community being studied. 

(3) There must be one or more critical points such that therapy instituted 
before the critical point is more effective in interfering with pathologic sequence 
than is therapy undertaken after the point. 

(4) Such a critical point must occur in the time sequence of the disease after 
the time when diagnosis first becomes possible and before the time when diagnosis 
is made under the usual disease pattern of the community. 
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MONG the forces contributing to the ecologic pattern of disease in a 
community, one cannot overlook the efforts of the organized health agencies. 
Through the application of knowledge of the natural history of various diseases 
these agencies have sought to raise the health level of the community by influenc- 
ing, in one way or another, the incidence, prevalence, and severity of disease. 
It is the purpose of this paper to describe major historical aspects of the 


development of statistical indices for the appraisal of such organized community 
health activities and to discuss certain principles to be considered in the formula- 


tion of adequate indices of accomplishment. 

Some 30 years ago, Wade Hampton Frost® noted that “‘. . . the health officer 
occupies the position of an agent to whom the public entrusts certain of its re- 
sources in public money and cooperation, to be so invested that they may yield 
the best returns in health; and in discharging the responsibilities of this position 
he is expected to follow the same general principles of procedure as would a 
fiscal agent under like circumstances.’’ His thesis was that the demonstration 
of accomplishment should be supported by a simple and unquestioned theory 
about the activity under study in conjunction with adequate statistical data, 
‘‘ . . for while the facts expressed in a statistical record constitute only a part 
of the evidence required, they constitute the part which can be most conveniently 
and forcibly presented and are essential to any quantitative statement of results 
achieved. 

“Tf we really desire to submit our judgement of what is being accomplished 
to this final test of statistical evaluation we must set about collecting statistics 
which will serve the purpose.” 

It may seem remarkable that Frost wrote in this vein as late as 1925. His 
statement implied, first, that not all statistics on health and disease in the 
community were appropriate to measure the results achieved by health depart- 
ments and, second, that the logical bases for a quantitative evaluation of these 
results were often lacking. As we shall see, both implications find considerable 


justification even today. 
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EVALUATION BY MORTALITY AND NATALITY STATISTICS 

The formal gathering of statistics originated several centuries ago to provide 
government with data on the conditions of the governed. The collection of data 
on population, births, deaths, and other demographic characteristics was initiated 
primarily to furnish the state with information on which to base policy decisions. 
Graunt’s interest in vital statistics (1662), of which he is considered the founder, 
resulted in part from his preoccupation with what now is called urban planning 
or redevelopment.’ The Memoires des Intendants, summarized in the classic 
work of Moheau (1778), were the seventeenth and eighteenth century equivalent 
of Reports of the President’s Council of Economic Advisers. The statistics on 
deaths were used simply to describe existing conditions and hardly ever to eval- 
uate governmental efforts in the field of health. Perhaps this was so because no 
one expected that governmental effort could do much about improving health 
conditions. Where, instead, governmental effort was considered to be deciding, 
as in the maintenance of an adequate food supply, pertinent statistics were em- 


ployed to measure the results. Thus, Graunt pointed with pride to the data which 


showed that in London only a few persons died of starvation. 

The statistical index which first served to measure the health condition of 
the community and which was already in use in Graunt’s time was the mortality 
rate or its equivalent. (With Halley’s contribution (1693), mortality was also 
measured in terms of survivorship and average life expectancy.) However, 
until the middle of the nineteenth century, students of vital statistics empha- 
sized the regularity of the age distribution of deaths vear after vear and, save 
for epidemic years, the relative constancy of mortality from year to year in any 
one locality. Siissmilch,” in the late eighteenth century, attributed such regu- 
larity to a ‘Divine Order,”’ while Quetelet,!® 100 vears later, attributed the same 
regularity to ‘natural law.”’ So far as can be determined, mortality did occur 
at a fairly constant rate during this period. This constancy in rate, incidentally, 
encouraged the use of life tables and the extension of mathematical actuarial 
theory. It is of interest to speculate whether the growth of insurance companies 
in the eighteenth century would have been as marked if the annual rate of mor- 
tality had not been constant. 

Interest in the variation in mortality and survivorship rates existed, but 
mainly to identify etiological factors. In the works of Graunt,’ of Siissmilch,” 
and of Quetelet,'° differences in mortality were examined in terms of sex, urban- 
ism, climate, topography, and other physical and social environmental conditions 
to discover causes of variation, not to measure the effectiveness of community 
health work or to determine policy in this field. 

Data were collected simply to describe the health problems, since there 
were no organized community health activities, and especially since ideas on 
what to do about the problems were lacking. Quetelet’s La physique sociale 
(1835), the best example of midnineteenth century quantitative study of com- 
munity health problems, makes it clear that the major preoccupation was with 
mortality. The impression is obtained that survival was the primary goal of 
the community. In a chapter devoted to statistical measurement of the prosperity 
of a population, Quetelet suggested use of the index: number of births divided 
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by number of deaths in a specified unit of time. He pointed out that when this 
index is above unity, health conditions of the population are favorable; when it 
is less than unity, conditions are unfavorable. Some 70 years later Raymond 
Pearl was to reintroduce this index, which he called the Vital Index of Population. 

This index provides a limited interpretation of the health conditions of a 
community since the numerical value of the index for a community with high 
mortality and high natality could be the same as that for a community with 
low mortality and low natality. 


PUBLIC HEALTH MOVEMENT 


The Public Health Movement and the bacteriologic discoveries which began 
almost simultaneously in the middle of the nineteenth century brought “‘. . . to 
the mind of the average man the conception that life and death are not merely 
dispensations of divine providence but lie within the control of the human mind 
and the human will... .’"'! This was a marked change in social attitude and it was 
reflected in a changed approach toward the collection of statistics related to 
health problems. The new approach was characterized by: 

Attention to Cause of Death With Particular Reference to Infectious Diseases. 
Interest in the publication of data on deaths by cause led to the study of classi- 
fications of causes of death by the first International Statistical Congress (1853), 
and eventually in 1900 to the adoption of the International List of Causes of 
Death which, with revisions, is in use today. 

Comparisons of Mortality Rates Between Localities for Purposes of Stimulating 
Community Action to Improve Environmental Conditions.—Spurred by the con- 
viction, so ably presented by Chadwick (1842) and others, that the removal of 
filth and the correction of insanitary conditions would reduce epidemics and mor- 
tality from the epidemic diseases, the higher mortality for one community in 
comparison to that of another was regarded as an indication that the former 
required additional sanitary action. No longer were comparisons among commun- 
ities made solely to reveal possible reasons for differences, but they were made 
in order to emphasize the amount of work to be done by the community with 
the higher mortality. Pettenkofer’s’ (1873) remarks in comparing Munich's 
mortality with that of London illustrate this point. “If it was possible in London, 
in historical time, to reduce the death rate from 42 to 22 per thousand, we are 
well justified in hoping that in Munich too, we may be able to come down with 
our death rate, from 33 to 22. All we have to do is find out what factors and 
measures have contributed in London to this propitious result and to apply them 
intelligently to our conditions in Munich.” 

Consideration of the Amount of Sickness.—Systems of compulsory reporting 
of selected communicable diseases were initiated in many communities to provide 
the health authorities with data on the number of cases of disease and their 
distribution in the population. In addition, increasing interest was expressed 
in the sickness and disability which accompany disease. Pettenkofer’ estimated 
the reduction in number of days of sickness and in costs that would result from 
a reduction in number of deaths. (He assumed 34 cases of sickness for every death, 
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20 days of disability for every case of sickness, and a total of one florin per day of 
disability in loss of wages and expenditure for medical and hospital care.) John 
Shaw Billings, while still a military medical officer in 1878, suggested to the Sur- 
geon General of the U.S. Army that the population census of 1880 include in- 
formation on sickness because: ‘‘As is pointed out by the Royal Sanitary Com- 
mission of England, however complete the registration of deaths may be it cannot 
give a fair estimate of the sickness which is not fatal, it cannot indicate where 
or how these are to be prevented, it cannot tell the cost which it is worth incurring 
for their diminution.’ Actually, this suggestion that the 1880 census include 
information on sickness was followed, and for the first time in this country 
statistics on sickness and disability became part of the measurements of health 
problems. 

In brief, it would appear that in conjunction with increasing knowledge 
about specific diseases, with greater assurance that certain diseases could be 


prevented and with wider acceptance of the view that health was not simply 


the absence of death, data were collected on specific causes of death and on 
sickness and were utilized as indices of required activity for the solution of health 
problems. These historical associations and all that they entail must be borne 
in mind in order to understand the factors which have brought about the develop- 
ment of the indices of health department activities in current use. 


STATISTICAL APPRAISAL OF HEALTH DEPARTMENT ACTIVITIES 


In this country, the growth of governmental health activities, which to a 
large extent resulted from the Public Health Movement, manifested itself in 
terms of increase in both numbers of health departments and types of activities. 
Chapin‘ described well the evolution in types of activities: ‘The sanitarians 
of the nineteenth century, up to about 1870, were, except for maritime quar- 
antine, occupied chiefly with the problems of municipal cleanliness. . . . The 
attempt to control contagion became the chief function of the health department 
for the next fifty vears. ... During the last twenty years or so [Chapin was writing 
in 1921], a new phase of public health work has been evident. Attention is being 
directed more and more, not only to persons, but to individual persons. Education 
has become one of the most important factors, and it is the education of the 
individual which is aimed at. Cure also must be personal and individual. The 
great health movements, like the movement against tuberculosis and that against 
venereal diseases, the hookworm campaign in the South, the prevention of 
infant mortality and the medical supervision of school children, all seek out the 
individual, teach right ways of living and offer treatment.’’ Today we can add 
to this list such movements as those aimed at the control of cancer, heart disease, 
and mental disorders. 

At the same time, ideas regarding the collection and utilization of statistics 
to appraise health department activities began to be formulated in a more 
definitive manner. Shattuck! in 1850 had recommended sanitation surveys 
along the lines of his report on Massachusetts. Both in this country and in the 
United Kingdom, community surveys were actually carried out. They attempted 
to measure the health problems of the community and indirectly appraised the 


Bute S APPRAISAL OF HEALTH DEPARTMENT ACTIVITIES 513 
activities of the government. The most far-reaching of these surveys in this 
country, from the standpoint of the development of statistical indices, was 
that undertaken by the American Public Health Association’s Committee on 
Municipal Health Department Practice (now Committee on Administrative 
Practice) in cooperation with the U.S. Public Health Service and the Metro- 
politan Life Insurance Company. This survey, which obtained data on the health 
activities of 83 cities, was the origin of the establishment of the Association's 
Evaluation Schedule ‘‘for use in the study and appraisal of community health 
programs.” Until a few vears ago, the use of this schedule included the calcu- 
lation of indices for purposes of comparing one community with another. In 
the 1947 schedule, for example, formulas for calculating over 130 ratios or indices 
were provided, and suggestions were made for comparing 32 of these ratios with 
either ideal or the best of observed indices. It is revealing to note the kinds of 
indices which are suggested: percentage of hospital beds in approved hospitals; 
population per practicing physician and per practicing dentist; percentage of 
population in communities over 2,500 served with approved water and with 
approved sewerage systems; percentage of rural school children served with 
approved water supplies and with approved means of excreta disposal; percentage 
of food handlers reached by group instruction program; percentage of restaurants 
and lunch counters with satisfactory facilities; percentage of bottled milk pas- 
teurized; percentage of children under 2 years given immunization for diphtheria, 
smallpox, whooping cough; newly reported cases per tuberculosis death; tuber- 
culosis deaths per 100,000 population; percentage of tuberculosis cases reported 
by death certificates; percentage of contacts of newly reported tuberculosis 
cases examined; percentage of syphilis cases reported in primary, secondary, 
and early latent stages; contacts reported per 100 cases in primary, secondary, 


and early latent stages; percentage of reported syphilis contacts examined; 
puerperal deaths per 1,000 total births; percentage of antepartum cases under 
medical supervision before the sixth month; percentage of women delivered at 


home under postpartum nursing supervision; percentage of births in hospital; 
deaths under 1 vear of age per 1,000 live births; deaths from diarrhea and enteritis 
under 1 year per 1,000 live births; percentage of infants under nursing super- 
vision before 1 month; percentage of entering school children examined with 
parent present; percentage of elementary school children with dental work com- 
pleted; deaths from motor accidents per 100,000 population; deaths from home 
accidents per 100,000 population; cents per capita spent by the health depart- 
ment. 

In more recent revisions of this guide (1955), suggestions are made to collect 
data that may be used to calculate the above types of indices but less emphasis 
is given to the calculation of indices. In line with newer public health practices, 
community agencies are asked questions such as: How many children were 
served last year in well-child conferences, under age 1, 1 to 2. . 5 to 6? However, 
they are not told to calculate ratios or averages. Greater interest in certain chronic 
diseases, such as cancer, heart disease, diabetes, mental hygiene, industrial 
hygiene, health education, social services, and rehabilitation, is demonstrated 
by expansion of sections devoted to these activities; but suggestions for the 
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appraisal of activities are usually in the form of questions such as: ‘‘What de- 
tection center and cancer clinic facilities are available in the community?” 
“Does the educational program reach: (a) physicians, (b) nurses, (c) dentists, 
(d) general public?” 

The changes in public health activities described previously by Chapin‘ 
brought about introduction of indices which serve to measure the operations of 
the health departments rather than the bearing of these operations on the health of 
the community or the survivorship of its people. Some of these indices are easy to 
interpret in terms of the effectiveness of health department activities to reduce 
disability or prolong life. In these terms, the percentage of children under 2 
vears of age given immunization for diphtheria has meaning. There is evidence 
of the effectiveness of such immunization to prevent diphtheria (provided that 
the organism remains unchanged) and consequently, if all of the children have 
been immunized at a young age, we can anticipate that the prevention of sickness 
and death from diphtheria has been accomplished. It is less easy to interpret, 
in terms of reduction of sickness or death, an index such as the proportion of 
food handlers who have received group instruction, or an index based on the 
number of persons who receive educational material. Greenberg and Mattison‘ 


(1955) point out that the latter index may be used as a measure of a step in the 
evaluation of the effectiveness of health educational literature in eventually 
reducing morbidity and mortality. This view would assume that the succeeding 
steps actually will achieve the final objective of the health department activity. 


The burden of proof for such an assumption is on the users of the index. 

Apparently, the formulation of statistical indices currently suggested or 
employed to appraise health department activities are based both on the degree 
of knowledge of the factors which affect health and on tenets which underlie 
current public health practices. It is interesting to note also: 

(1) Mortality rates still occupy a central place among appraisal indices as 
shown by the number of these rates mentioned. In part, this may be due to the 
noticeable lack of data on sickness and disability which only very recently is 
being remedied. In part, it may be due to a continued preoccupation with sur- 
vival on the part of the community. In this respect, it may be that the general 
public lags behind the public health worker who has set as his objective to achieve 
something more than absence of disease. On the other hand, the public health 
worker has shown only limited interest in the collection of data on sickness and 
disability, let alone on other signs of variation in health status. 

(2) Indices for the appraisal of health department activities aimed at such 
chronic conditions as cancer, heart disease, and mental disorders tend to measure 
activities directed at health professions and facilities rather than at the disease status 
of the community. For example, questions are asked regarding the existence of 
special provisions for the continuous care of children with rheumatic fever 
and the agencies responsible for those provisions, but not regarding the number 
of children with rheumatic fever in the community, the number under continuous 
care, the number not under care and the reasons for this, or regarding the dis- 
ability experience of both groups. When mortality or morbidity statistics are 
available on some of these chronic disease conditions, they are employed mainly 
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for comparisons among communities in order to explore possible etiological factors 
rather than to appraise the relative effectiveness of the activities of the commun- 
ities. This approach is similar in many respects to the situation observed a cen- 
tury ago. Again, it could be an indication of a feeling of helplessness regarding 
any measure of prevention or control. 

(3) A new feature of the current utilization of appraisal indices is that of 
comparing observations in the community with standards established by experts. 
Some of these standards represent objectives more or less theoretically possible, 
as would be the case of setting the standard of infectious diseases rates at zero; 
others represent generalization of experiences which are considered to indicate 
adequacy, as in setting the standards of relative number of general hospital beds 
equal to that found in the 12 states with the highest relative number; others 
represent the rationalizations of the hopes and convictions of students of the 
subject. The tendency to follow this approach reveals perhaps that public health 
workers still retain faith in someone's ability to achieve complete understanding 
of the means of meeting community health problems. It seems unnecessary to 
point out that such an approach could be dangerous. It could lead to blind 
reliance on authority, or worse yet, to a smug satisfaction with conditions which 


satisfy current opinions. 
INTERPRETATION OF STATISTICAL INDICES OF HEALTH DEPARTMENT ACTIVITIES 


As should be clear now, many statistical indices have been introduced to 
appraise health department activities. The multiplication of these indices cor- 
responds to the increase in number and scope of health department activities. 
The construction of an index is mechanically a simple procedure. But the inter- 
pretation to be given to variations in the numerical value of an index is ordinarily 
a complex analytical operation, involving an understanding of the elements of 
the index, the phenomenon it actually measures, the factors which influence 
variations in the phenomenon, and the relationships between the phenomenon 
and the health status of the community. Even when dealing with an index such 
as the infant mortality rate, long recognized as being a sensitive measure of 
well-defined objectives of health agencies, the interpretation of differences among 
communities is not so simple. For example, in Pennsylvania 62 of the 67 counties 
do not have a local health department or carry out organized local health ac- 
tivities. The median infant mortality rate of these counties in 1956 was 22.7. 
In the neighboring state of Ohio there are 13 counties with full-time health de- 
partments, in New York there are 16 counties with full-time health departments. 
The medians of the infant mortality rates in these counties with health depart- 
ments in 1956 were 24.6 for Ohio and 22.9 for New York. In other words there 
are a number of counties in Pennsylvania which have no organized local commun- 
ity health activities yet which have a better record in terms of infant mortality 
than do counties with health department activities. Similar observations can be 
made regarding other indices and other comparisons. The point is that com- 
parisons among communities or within communities require a logical analytic 
approach. In the case of the comparisons just cited, it may well be that com- 
munities with well-trained physicians and a population characterized by a high 
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level of education and understanding of health problems will have a low infant 
mortality even though health agencies do not exist. The appraisal of health 
department activities must take into consideration the specific role of the health 
agency as well as the general effects of progress in public health concepts. These 
concepts are acquired by the community as part of its social evolution, in which 


the health department participates. 

In seeking for quantitative indicators of the effectiveness of the operations 
of health agencies, the same logical approach is required as for the measurement 
of the effects of any procedure on the behavior and reactions of groups of mice 
or men. This was emphasized some time ago by Edgar Sydenstricker,'® who in 
1926 outlined certain principles which should guide the measurement of public 
health work. According to his views: (1) ‘Specific activities, rather than the 
program as a whole, should be measured first.’’ (2) ‘‘The objectives and methods 
of a public health effort should be clearly defined.”’ (3) ‘Principles of experi- 
mentation should be applied.”’ (4) ‘‘The use of ‘experimental’ and ‘control’ 
groups or areas should be followed.”’ 

These principles, which parallel those set forth relative to physiologic experi- 
mentation by Claude Bernard in 1865, were further amplified and illustrated 
in the Pittsburgh Conference on Methods in Public Health Research in 1950. 
It is unnecessary to discuss them in great detail but certain of their consequences 
deserve elaboration since they are pertinent to the topic of this paper. They are 
those referable to (1) formulation and selection of numerical indices, (2) com- 
parisons of effectiveness of health activities among and within communities. 

Formulation and Selection of Appraisal Indices.—Any reaction or behavior 
manifestation of the population group or of the health department which can 
be counted or measured may be used as the basis for an index. This may be 
expressed in the form of a ratio—proportion of school beginners immunized 
or an average—mean duration of life. It means that the numbers of indices which 
can be formulated are many, and the issue is to select among the many possible 
indices those that satisfy best the logical requirements of all measurements and, 
specifically, of measurements of the effectiveness of health department activities. 
Logical requirements of measurements in field studies were stated at the 
1950 Pittsburgh Conference of the Public Health Study Section: ‘“The measure- 
ments used should be: 1. Objective; 2. Repeatable by different observers and at 
different times; 3. Efficient in terms of clear and mutually exclusive divisions, 
undistorted scale, and broad range of values to be recorded; 4. As simple, generally 
available, easily performed, and inexpensive as possible; 5. Accurate in sensitivity 
and specificity; 6. Tested in pilot study.”’ 

In line with these requirements, a first consideration pertinent to the selec- 
tion of an appraisal index is that the elements of the index, the counts or measure- 
ments of the things or activities under scrutiny, are objective, reliable, accurate, 
and easily obtainable. All things being equal, the most appropriate data would 
be those derived from some routine operation which is independent of the specific 
health department activity. To the extent that birth and death certification, 
disease reporting and registration, governmental census, school and other insti- 


tutional records, etc. are reliable and accurate, they provide data that are easily 
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obtained and objective, i.e., persons involved in the activity cannot interject 
bias in the observation. For certain specific activities, however, such as ‘‘classes 
for expectant mothers,” it is difficult to obtain routine independent data that 
could conceivably relate to an appraisal of this activity which seeks to engender 
a wholesome attitude about pregnancy and childbirth. To obtain direct or 
indirect measurements for the evaluation of such activities requires that either 
a mechanism for independent objective appraisal be established as part of the 
activity or that independent objective appraisals be carried out from time to 
time. In either case, a research project must be undertaken with appraisal as its 
purpose. 

A second consideration pertinent to the formulation of indices is availability 
of knowledge about the properties of the ratio or average employed, and of the 
factors which are related to variations in its numerical value. Mortality, mor- 
bidity, case fatality rates, ratios of health facilities and personnel to population, 
and so on measure with greater or less precision many aspects of the interaction 
between man and his total environment. To use these rates or ratios as indicators 
of the effectiveness of health department activities assumes that their char- 
acteristics and the factors related to their variation are understood and, further- 
more, that the health department activity is also a factor in that variation. As 
an illustration, the ratio of hospital beds per 1,000 persons is correlated with 
the ratio of physicians per 1,000 persons; both are highly correlated with the 
degree of wealth and with ‘‘cultural”’ facilities, such as centers of learning, avail- 
able in the community. Any attempt at appraising some specific health depart- 
ment activity, even one devoted to the administration of Hill-Burton funds, 
through the use of the ratio of hospital beds per 1,000 persons must take into 
consideration the dependency of both number of hospital beds and health depart- 
ment activity on the wealth and high level of education in the community. 
Similarly, the validity of using an index based on per capita expenditure for 
the health department as a measure of acceptance of the health department 
by the community depends on the means by which the budget of the community 
is decided upon, and the extent to which the health department’s activities are 
a factor in the decisions reached. Knowledge regarding the infant mortality rate 
and the factors related to its variation has served to give this rate the standing 
of a general index of health department activity. High mortality among infants 


is due in large part to infectious diseases and is indicative of the incidence and 
prevalence of these and other diseases with a common origin in ignorance, poverty, 
lack of sanitation, and of good hygienic practices. Use of the infant mortality 
rate as an index which appraises health department activities aimed at improve- 
ment of sanitation, raising of hygienic standards, and prevention and control of 


infeccious diseases is amply justified in principle. However, as has been mentioned, 
community actions in addition to health department activities play a role in 
the variation in the numerical value of this index. 

A third and most important consideration in formulating appraisal indices 
is that of establishing the nexus between an index and the objective of the ac- 
tivity it is supposed to measure. This nexus can be correctly perceived only when 
the objective of the health department activity is stated precisely and unequiv- 
ocally. Confusion is often created when a variety of objectives are included 
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under a term that could be understood to mean a single activity, e.g., well- 
child conference, mental hygiene, or cancer control. For example, cancer control 
programs in this country may have a number of different specific objectives, from 
the maintenance of a tumor registry to the operation of diagnostic services. An 
index of appraisal for the former should be based on the completeness and ac- 
curacy of the registry, for the latter it should be based on the accuracy, stage, 
and timing of diagnosis, on the load it carries for the community, or, if its function 
includes follow-up, the outcome of the diagnoses. For the registry an index based 
on cancer morbidity, incidence or prevalence would make little sense; it might be 
appropriate under certain conditions for services. 

Comparisons Among Communities and Within Communities——It has been 
mentioned that with the development of the public health movement, compari- 
sons among communities were employed to indicate what could be accomplished. 
For this purpose such comparisons may be worth while, particularly if differences 
among communities serve to stimulate the public health authorities to probe 
more deeply into the specific health department activities which may have re- 
sulted in the observed differences. But it is precisely at this point that difficulties 
are encountered, for an appraisal of specific health department activities requires 
the disentanglement of the action of the health department from pertinent con- 
ditions in the community. Even when dealing with such a well-established mea- 
sure as diphtheria immunization, comparisons cannot be easily interpreted. 
Differences among communities in diphtheria mortality obviously will reflect 
differences in the operation of the health agency. However, the factors which 
relate to the differences in activity may be of many kinds: personnel, facilities, 
cooperation of the medical society, public acceptance, and level of diphtheria 


experience in preceding years. Unless in some way the factors that affect differ- 


ences are taken into consideration, such comparisons have meaning only to reveal 
that one community has more problems than another. Comparisons become 
even more difficult to interpret, for example, when dealing with the number 
of defects found at school health examinations for the prevention of which there 
are no specific health department activities. For such indices, it is not immediately 
obvious what factors to take into account to achieve meaningful comparisons. 

On the surface, long-term comparisons in the same community would seem to 
offer easier interpretation. Observations are made at one initial point of time, a pub- 
lic health measure is introduced, and changes in the health condition are observed. 
Again, suppose we take as illustration a well-established procedure such as 
diphtheria immunization. The case or mortality rate at a certain point in time is 
measured, the immunization program is introduced, and the changes in the rate 
are observed. In this instance the community serves as its own standard of com- 
parison. This is valid zf no other changes in the community occur which may produce 
the same effects. For a specific procedure such as diphtheria immunization, this 
would not be expected, vet even for such an activity the results in one community 
might or might not be related to direct action by the staff of the health depart- 
ment. As an example of the former, the well-known antiyvaws campaign in Haiti 
might be cited because it achieved practically universal compulsory treatment. 
In other instances, the results could be due to the indirect effects of the 


b So iy APPRAISAL OF HEALTH DEPARTMENT ACTIVITIES 519 
educational efforts of the health department which have led to action on the 
part of parents, of medical practitioners, etc. 

When dealing with less specific health department activities, particularly 
those that concern some aspect of social behavior, it becomes more difficult to 
determine whether or not changes in health conditions are a product of the health 
department’s activity. In such circumstances, a comparison group, a standard, 
or “control” is definitely needed. 

Establishing criteria for the selection of comparison groups which will pro- 
vide a basis of meaningful interpretation is a particularly difficult problem. 
One such criterion is obviously that the ‘‘experimental’’ and ‘‘control”’ groups 
should be selected without bias. However, it is necessary to secure the coopera- 
tion of the group which is going to submit to the experiment. This attitude of 
cooperation, however, can itself prejudice the experiment. Furthermore, it is 
difficult to find two groups that are alike in all but the activity under study. If 
this activity is to be publicized and is stated to be something ‘‘good,’”’ how can 
one expect the ‘“‘control’’ group not to want it also? Except for the Newburgh- 
Kingston study of fluoridation and reduction of dental caries, there have been 
few situations in which this problem has been successfully met. Several approaches 
to this problem might be attempted. I shall describe two which we outlined for 
specific situations. 

In the first of these, it was felt that for political reasons once its use was 
announced no community could be deprived of a prophylaxis procedure even 
though it was still very much in the experimental stage. This is a view which 
public officials as well as medical practitioners hold when a new drug or procedure 
catches the public’s fancy. To avoid political repercussions and still achieve some 
means of evaluating the procedure, the following steps were suggested: (1) 
measurement of the status of the population of the total area containing the 
several communities with respect to the disease condition under study; and 
(2) introduction of the prophylaxis procedure gradually, a few communities at 
a time, the selection of the communities to follow a plan whereby certain com- 
munities would serve as ‘“‘controls’’ for a given period of time. 

In the second, efforts were being made to improve simultaneously the gen- 
eral economic condition and the health program of certain areas of a country. 
In developing possible designs for the evaluation of the program the following 
considerations seemed important: 

(1) Improvement in the economic and educational level is anticipated 
throughout the whole country. Therefore, we must expect that the effects of 
this improvement in terms of increased demand for health services and increased 
realization of health problems will be country-wide. 

(2) The study area will be most affected by these anticipated economic 
and social improvements and consequently will be expected to reveal a higher 
increase in demand for health services and in realization of health problems than 
other areas of the country. However, within this area there will also undoubtedly 
be variations in the degree of economic and social improvement. 

(3) When the details of the program are publicized, other areas within the 
country will seek to adopt one or more aspects of the program, 
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In view of these considerations, it seemed rather hopeless to set up fixed 
“control areas.’’ That is, there was no point in asking the question: Has the utili- 
zation and quality of services improved more in the study area than in area X 
(where area X is comparable to the study area except for the program)? Instead, 
the question was rephrased as follows: Has the utilization and quality of ser- 
vices improved at a greater rate than expected, relative to economic changes, in 
those parts of the study area and in other areas where the program has been put 
into effect than in the areas of the country in which the program has not been 
put in operation? 

To answer this question means that we must (a) obtain indices both of 
economic and social development and of improvement in health services for all 
the communities of the country for the period in which comparisons are desired; 
(b) measure the relationship between improvement in economic and social 
conditions and improvement of health services in all of the country where the 
essential parts of the program have been put in operation and where they have 
not; (c) make comparisons of changes in health services primarily between those 
areas in which similar changes in economic and social conditions have taken 


place. 


SUMMARY 


This review of the development of health department appraisal indices 
and outline of some of the logical criteria required for the correct interpretation 
of variations in these indices reveal a number of points that are important for 
establishing quantitative measures of the effects of community health activities. 

The existence of information on mortality led to the utilization of mortality 
statistics, first, as a measure of the health status of the community; later, as a 
tool in the search for etiological factors and as a means of arousing the community 
to action; and, finally, as an index of achievement of community health work. 

Increase in knowledge about etiological factors of certain causes of death, 
demonstrations of the preventability of certain disease conditions, and a growing 
community realization that health is not simply absence of death were accom- 


panied by the initiation of routine procedures to collect data on specific causes 


of death, sicknesses, immunity status, etc., and to utilize these data for the con- 
struction of indices to measure health needs of the community as well as the 
achievements of its health agency. 

Meanwhile, spurred on by a broadening definition of health, community 
health agencies have moved toward activities which are less directly or immed- 
iately concerned with death or infectious diseases. As a result, available mor- 
tality and morbidity statistics have become less useful and less appropriate for 
the construction of indices to measure achievement, and are being replaced by 
data on the amount of health department operations and on the extent of health 
facilities and personnel in the community. In formulating appraisal indices 
of activities derived from the broader concepts of public health, the focus of 
interest has shifted from a consideration of the health status of the community 
to that of volume and kinds of health services and facilities provided. 
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Consequently, variations in these indices are difficult to interpret in terms of the 
adequacy with which health needs of the community are being satisfied. 

It is also difficult to assess, from these and other appraisal indices, the 
actual contribution which the activities of a health agency has made in meeting 
certain health and disease problems of the community. Each health agency is 
an integral part of the community, and changes in philosophy and activities of 
an agency are reflected in, and reflect, changes in philosophy and activities of 
its community. By the direct examination of changes in the numerical value of 
the indices, it is not always possible to identify which changes in health status 
(or whatever else is measured by the indices) are due specifically to health de- 
partment activities and which are due to general community evolution. On this 
point, a question may well be raised as to the importance of differentiating pre- 
cisely between the activities of the health department and those of its community. 
In answer, it is to be noted that such differentiation is important, not only for 
theoretic reasons but also for reasons of practical policy. When faced with 
the problem of recommending expenditures to improve the health of communities 
it would be well to know if expenditures for the expansion of health department 
activities are effective without corresponding efforts to accelerate social evolution. 

To overcome these difficulties, to formulate appraisal indices which can be 
correctly interpreted, requires the application of sound investigative methods. 
Some of the logical criteria to be considered in constructing indices and in com- 
paring communities both in time and space have been examined and their appli- 
cation illustrated. The illustrations make it clear that a community agency 
sincerely interested in measuring the effects of its activities on the health of the 
people must be willing to establish a research program for this purpose. 

An essential step in the planning of such research is to recognize the prin- 


ciple that the focus of appraisal is not the volume of facilities and services but 
the measurement of the health needs and status. Fulfillment of this principle 
requires that at least two conditions be satisfied: (1) statements of the immediate 


objectives of specific health department activities and (2) systematic collection 
of data on the health problems of the community. Some health department 
activities are aimed directly and specifically at the elimination of certain causes 
of death, others at prevention of certain diseases, others at reduction of certain 
disabilities, others at promotion of certain health practices, and others are 
intended simply to observe accepted norms of administration. Explicit statements 
regarding the immediate aims of a number of specific activities are frequently 
lacking, and sometimes the impression is gained that all these aims are gradations 
of a single aim such as prolongation of life or promotion of health. To determine 
how well specific activities meet their stated objectives and how well these ob- 
jectives meet the health problems of the community requires data on mortality, 
sickness, disability, health practices, and attitudes. The means for the collection 
of these data exist in all communities, but the organization and, for want of a 
better term, interest is often missing. A true appraisal of community health 
activities, measurements of achievements and goals, is obtained only when specific 
aims of these activities are contrasted with corresponding data on the health 
needs and problems of the community. 
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_ statement that a disease and a characteristic are associated can be 
interpreted in a number of different ways. 

(1) The deliberate introduction of the characteristic into a subgroup of a 
population will be followed by an altered incidence of the disease in that subgroup. 

(2) Those members of a population possessing the characteristic will experi- 
ence a different incidence of the disease than will those lacking it, the route by 
which the characteristic was acquired being unspecified. 

(3) The prevalence of the characteristic among newly developed cases of 
the disease differs from its prevalence among those in whom the disease did not 
develop. 

The logical content of these three statements is obviously not the same, and 
the presence of an association of any one of the three types does not necessarily 
imply that a similar association exists for the other two. We may refer to the 
first as a causal association and to the other two as observational associations. 

In studies of disease etiology, the major interest is in causal associations. It 
is often not possible, however, to study such associations by direct experimenta- 
tion in populations of interest, especially when the introduced characteristic 
may lead to an increased incidence of the disease. The introduction of control 
measures designed to decrease the incidence of the disease will, of course, some- 
times provide such an experimental test.1 When experimental tests are not pos- 
sible, however, inferences about causal associations must be drawn either from 
a variety of observational associations or from a study of causal associations in 


animal populations or from a combination of both. In the nature of the case 
such inferences cannot be certain and, what is perhaps worse, no quantitative 


characterization of the uncertainty appears possible. There appear to be no 
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general rules of inference or routines of thought that are helpful in such situa- 
tions, although a number have been proposed. Particularly mischievous in this 


connection, in our opinion, is the distinction sometimes drawn between ‘‘statisti- 


cal’ and “‘biologic’’ evidence. 


MEASUREMENT OF RISK OF DISEASE OCCURRENCE 


Our concern here, however, is not with these questions of inference but with 
observational associations, the study of which is clearly required, irrespective 
of the exact nature of the causal inferences that they will support. Lacking the 
ability to introduce deliberately the characteristic into a subgroup, the next 
best step involves subclassifying the members of some well-defined population 


TABLE I. 


DEVELOPMENT OF DISEASE 


CHARACTERISTIC TOTAL 
NUMBER OI NUMBER OF 
INDIVIDUALS +4 INDIVIDUALS — 


Total 


(or a sample of it) according to whether they do or do not possess the charac- 
teristic. One may then count the number of cases of disease for those with and 
without the characteristic in a number of different ways: (1) the number of 
new cases of the disease developing during some period of time, say a vear, usually 
designated by the phrase incidence, (2) the number of deaths from the disease 
occurring during a given period of time, the mortality, (3) the number of live 
individuals having the disease at some moment of time, the point prevalence, or 
(4) the number of live individuals having the disease at any time during some 
interval, the interval prevalence. Certain systematic relationships obtain among 
these different measures.? The interval prevalence of a disease, for example, 
will never be less than any point prevalence within that interval. In comparisons 
among groups with different characteristics, the various measures will often 
vield quite similar results. Although strictly speaking etiological aspects are 
best analyzed from the point of view of incidence, these empirical relationships 
have encouraged the use of the most convenient measure for study purposes, 
usually mortality. In strict logic, however, there is no reason why the different 
measures should yield even qualitatively similar results. Neyman, for instance, 
has constructed an example in which comparison of point prevalence in two 


groups showed a large difference in one direction, while comparison of incidence 


showed a large difference in the opposite direction.’ Furthermore, examples are 
sometimes encountered in practice in which this question assumes importance. 
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Thus, because tuberculosis has a more rapidly fatal course among Negroes than 
among white persons, the prevalence rates in the two groups are approximately 
equal, even though the incidence of the disease is greater in Negroes than in 
white persons.‘ 

In the discussion that follows, we shall assume that the measure adopted 
is incidence. With certain obvious modifications in language, everything said 
will be applicable to other measures as well; the wisdom of using another measure 
in any particular instance depends upon special circumstances which must be 
individually evaluated. 

A population may be classified by presence or absence of the characteristic 
and development or nondevelopment of the disease (Table I). The number of 
individuals among those with the characteristic newly developing the disease 
is designated by a; b designates the number with the characteristic that did not 
develop the disease; and so forth. The incidence of the disease for the time period 

a 


covered among those with the characteristic is thus 4 and among those with- 
a+b 


out it It is customary to speak of the relative risk of developing the disease, 
c+ d. 
defined as the ratio of the incidence for those with and without the characteristic: 


Relative risk = 


Thus, if a= 1,000, b = 99,000, c = 500, and d= 99,500, the risk of developing 
the disease for those with the characteristic is twice the risk of those lacking it. 
If this relative risk is unity, we say the disease and characteristic are unassociated. 
The value of the relative risk, when other than unity, provides a measure of the 
degree of association; ratios in excess of unity indicate positive, and below unity 
negative, association. Other measures of association are possible, of course, but 
in the early stages of an investigation of the causal role of a characteristic, the 
relative risk provides a most useful descriptive summary of the association. A 
more extended discussion of this viewpoint will be presented in the section on 
“Potential Sources of Error in Retrospective Studies.’’ Other discussions of 
interest are given by Berkson,’ Sheps,® and Goodman and Kruskal.’ 


RELATION BETWEEN PROSPECTIVE AND RETROSPECTIVE STUDIES 


Studies which start with populations grouped initially into subclasses, for 
each of which one counts the number of new cases of a disease which develop 
during some subsequent period of time, are ordinarily referred to as ‘“‘prospective”’ 


or “population-based” studies. The annual incidence of most diseases is suff- 


ciently small, so that prospective studies designed to supply estimates of the inci- 
dence rate for different classes of the population, or of their ratios, must cover 
large numbers of persons. Thus, in a prospective study of lung cancer in a popula- 
tion of 100,000 males over age 40, one might at the end of 1 year of study expect 
to find 50 to 75 new cases. This is a small return for a large effort. The ‘“‘retrospec- 
tive’ or ‘‘case-control”’ study provides a more economical way of estimating the 
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relative risk than the prospective method because it does not require devotion 
of a large part of the study resources to those who did not develop the disease. 
In such a study one identifies all, or a well-defined sample, of the new cases of 
a disease as they occur during some period of time, and only after the occurrence 
of the disease does one classify them by the presence or absence of the charac- 
teristic (hence the name “‘retrospective’’). The remainder of the population, i.e., 
those who did not develop the disease during the period, is also sampled and 
similarly classified by presence or absence of the characteristic. Thus, a retro- 
spective study of lung cancer of the same population of 100,000 males over age 
40 would (in principle) uncover exactly the same 50 to 75 newly developed cases 
but would be free to study the characteristics of only a fraction of the remaining 
99,925 to 99,950 males who did not develop lung cancer. 

Retrospective studies might on the surface appear to supply only estimates 
of the proportion of persons with and without the disease who possess the char- 
acteristic and not to estimate relative risk. Such an estimate can easily be derived, 
however.*:? Denote by P the proportion of the population developing the disease 
during the time period of interest, i.e., the incidence rate; denote by p; the pro- 
portion of those developing the disease who possess the characteristic and by 
p» the proportion of those not developing the disease who possess the charac- 
teristic. All three proportions can be estimated from a prospective study. Thus, 


in terms of the notation of Table I: 


P 


The retrospective study will supply estimates of p; and ps but not of P. The 
incidence of the disease for those possessing the characteristic 1s 
piP 
piP 4 pe (1 — P) 
and for those lacking the characteristic 
(1— p,) P 
(1— p)P+(1— po) (1 — P), 


while the relative risk is the ratio of these expressions, namely 


p1 (1 — pi) P+ (1— pe) (1— P) 


1— p, piP+ po(1— P) 
For investigations in which the retrospective method offers the possibilities of 
important economies, P will be sufficiently small so that terms containing it 
may be dropped and the relative risk can be written with only trivial error as 
py 1— po 
‘Be p2 
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This ratio depends only on the two proportions p; and p», estimates of which are 
supplied by a retrospective study, and not on the over-all incidence, P. Its 
calculation can be illustrated using the data of Breslow and co-workers,!° which 
showed that out of 518 patients with carcinoma of the lung 499 were smokers, 
while out of 518 controls 462 were smokers. We thus estimate: 


pi = 499/518 = 0.9633 
b2= 462/518 = 0.8919 


Relative risk of lung cancer g 9633 0) 1081 
among smokers in terms of = — é = 
unit risk for nonsmokers 0.0367 § 0.8919 


POTENTIAL SOURCES OF ERROR IN RETROSPECTIVE STUDIES 


This somewhat idealized description of the retrospective study sounds so 
attractive as to make one wonder why any other kind should be considered. 
In the past, one reason for preferring the prospective study was failure to ap- 
preciate that an estimate of relative risk could be supplied by a retrospective 
study as well. But aside from this reason, which should now be of only historical 
interest, there are a number of possible sources of error that can arise in the actual 
conduct of a retrospective study that may take special efforts to eliminate and 
which may require detailed consideration in appraising the results."! 

A basic assumption for estimating relative risk from the retrospective study 
is that it is possible to enumerate all new cases of a disease, or a representative 
sample of them, without having to observe all the individuals in the population 
from which they arise and watching for cases to develop. This assumption might 
be correct if (1) all new patients with the disease sought medical attention, (2) 
all medical sources to which such patients might go were completely canvassed, 
and (3) an effective system for reporting such cases was in operation. In practice 
these conditions may be far from satisfied. Not all new patients seek medical 
attention, and most investigations confine their canvass to only the most con- 
venient medical source, hospitals. That this may not be sufficient is indicated 


by the experience of the excellent register of cancer cases maintained by the State 
of Connecticut. Despite the fact that hospitals reporting to the register cover 
94 per cent of the approved general hospital beds in the state, of the 76,000 
cancer cases registered during the period 1935 to 1951 almost 19,000 were first 
discovered by examination of death certificates and consisted largely of patients 


who received no hospital care or were not reported.” 

A second and closely related assumption which also requires careful exami- 
nation is that the sample of individuals not developing the disease supplies an 
unbiased estimate of the prevalence of the characteristic under study among the 
entire nondiseased population of interest. Most retrospective studies are content 
to select a ‘“‘control’’ group consisting of individuals with some disease other 
than that under investigation and to assume that the prevalence of the charac- 
teristic in that control group is an unbiased estimate of the required proportion. 
That this can be a most dangerous assumption is illustrated by Pearl’s!® well- 
known study of the association between cancer and tuberculosis. 
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From the first 7,500 autopsies performed at the Johns Hopkins Hospital, 
Pearl identified 816 individuals with cancer and 816 ‘‘control’”’ patients matched 
for age, sex, race, and date of autopsy. In the control group 16.3 per cent showed 
active tuberculous lesions, while in the cancer group only 6.6 per cent showed 
such lesions. A difference in the same direction and of the same magnitude per- 
sisted when the material was examined separately for white males, white females, 
nonwhite males, and nonwhite females. Numerous additional checks on the 
same material showed the same negative association, or as Pearl called it ‘‘an- 
tagonism,’’ between the diseases. The possibility that this negative association 
was causal was investigated by treating terminal cancer patients with tuberculin. 
We may ask, however, as Wilson™ did at the time and as Wijsman!™ has more 
recently, whether this negative association could even be taken as evidence that 
a population group with active tuberculous lesions at some moment of time would 
subsequently develop less cancer than a group lacking such lesions, whether or 
not the association was causal. But if the control autopsy group supplied a biased 
estimate of the prevalence of active tuberculous lesions among all noncancerous 
individuals in, say, Baltimore, our answer must be no. A recent examination of 
Pearl's original records in the Department of Biostatistics of the Johns Hopkins 
University shows that the control autopsy group included a considerable number 
of individuals dying from tuberculosis, and therefore, necessarily had a higher 
prevalence of active tuberculous lesions than that of all live noncancerous indi- 
viduals. Had the same negative association persisted when the controls consisted 
of those dying of some disease other than cancer and tuberculosis, it could not 
have been so easily attributed to a gross sampling bias. But when Carlson and 
Bell'® used as a control group only those who died from heart disease, they found 
the same prevalence of tuberculous lesions as in the cancer group and could not 
confirm the hypothesis of a negative association. Pearl’s result therefore arose 
not so much from the use of autopsy material as from using it in such a way as 
to obtain a grossly biased estimate of the prevalence of tuberculous lesions in a 


live population. Closer attention to representativeness of his controls could 


have avoided the error. 

In Pearl's study the grossly unrepresentative character of the control sample 
led to an apparently negative association between two diseases. Spurious positive 
associations can also arise. Thus, if patients in whom a disease occurs in a popu- 
lation do not necessarily seek medical attention, but if an individual with both 
disease A and disease B is more likely to seek it than one with A alone, this will 
lead to a spurious positive association.! 

There are several ways of guarding against the possibility of error arising 
from the unrepresentative nature of the controls. First of all, controls may be 
drawn from a wide variety of disease or admission diagnoses. If the prevalence 
of the characteristic under study varies widely among the groups, the possibly 
unrepresentative nature of at least some of them is strongly indicated.'® It is 
necessary, of course, that enough controls be studied so that if there are important 
differences in the prevalence of the characteristic among the different control 
groups this can be clearly demonstrated. In practice this will often require the 
study of more controls than of disease cases. A second possible way of proceeding 
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is to draw the control sample from the general population and not from other 
disease groups available in the hospital. This introduces a possible source of 
incomparability in the responses of the patients with disease and the controls, 
since the same question may elicit different answers when asked in radically 
different situations. A representative sample of incomparable responses does 
not represent an advance, however, so that the use of general population controls 
is not necessarily a panacea. The use of both general population and a variety 
of hospital controls provides a quite general (but not foolproof) safeguard against 
error from this source. 

The reporting in the retrospective study of the presence or absence of the 
characteristic after the disease status of the respondent is known introduces 
another potential source of error since conscious or unconscious bias in response 
may arise. The interviewer who believes, for example, that lung cancer is caused 
by excessive smoking might be more zealous in questioning a lung cancer patient 
who gave a nonsmoking history than he would be in questioning a control. A 
patient’s own preconceptions may also influence the answers. The classic example 
is the report by patients with cancer of the breast of prior injury to the affected 
breast with considerably greater frequency, and of prior injury to the unaffected 
breast with considerably less frequency, than was reported by a comparable 
group of controls.'* Similarly, in investigations of familial aggregations of disease, 
families to whom we are led because Susie has the disease may be more likely 
to remember that grandpa also had it than are families of controls without the 
disease. 

Only empirical investigation can determine whether such memory biases are 


operating in any particular instance. Thus, Doll and Hill,?° in their investigation 


of smoking and lung cancer, interviewed one group of patients who at the time 
of the interview had been diagnosed as having lung cancer but who were sub- 
sequently found to be suffering from some other disease. Their reported smoking 
habits, however, were similar to those of persons without lung cancer, even though 
at the time of the interview they had been diagnosed as having the disease. 
This finding seemed to rule out retrospective reporting error as an explanation 
of the Doll-Hill results. Clearly, whenever double-blind interviewing is possible, 
it will control this source of error. 

This catalogue of potential sources of error is not intended to be, neither 
should it be construed as, a blanket condemnation of the retrospective method 
or of any particular set of study findings yielded by it. The magnitude of error 
in any particular case is a substantive issue to be resolved on its own merits. 
Collateral evidence can provide information on possible magnitudes of different 
errors and the size of the spurious association that could result. Sweeping condem- 
nation of the retrospective method or uncritical acceptance of the results of 
single studies are equally to be avoided. The frame of mind which condemns 
any method that could lead to error under some conceivable set of circumstances, 
without also considering whether those circumstances have in fact arisen, is 
unlikely to be satisfied with any result outside the field of pure mathematics. 
The contrary frame of mind, which accepts a method simply because it will 
yield an answer without consideration of how much in error the answer could 
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be, is scarcely likely to be any more productive. The retrospective study provides 
an economical but not a foolproof method of studying certain types of relations. 
Its results, like all results in science, must be checked in a variety of other ways 
before they can be accepted with confidence. 


MEASURES OF ASSOCIATION 

Why take the ratio of the incidence rates of those with and without the 
characteristic as a measure of association? Why not some other combination of 
the two rates, such as the difference? There is no general agreement on the an- 
swers to these questions®® and we can do no more than present our own point 
of view.”! 

It is well to begin by recognizing that no single combination of the two 
incidence rates contains all the information yielded by the two separate rates. 
Given only the difference or the ratio or some other single combination, it is 
not possible to reconstruct the individual rates on which the original combination 
is based. The idea that there is necessarily a single measure which uniquely and 
comprehensively summarizes all aspects of an association seems erroneous. To 
talk, therefore, about the measure of the causal effect of an agent is to talk about 
a hypothetic phenomenon whose existence remains to be demonstrated. We 


would suggest that the appropriateness of any measure of association is to be 
judged by whether it helps in the understanding of the phenomena under study 


and not by any formal mathematical criterion of rightness or wrongness. 

From this point of view, both the ratio and the difference of incidence rates 
serve a purpose. Thus, if one accepts any observational associations found as 
causal and inquires about their importance, a natural measure to use is the excess 
number of cases of the disease attributable to the characteristic. This is equivalent 
to taking the difference in incidence rates as a measure of association. If, how- 
ever, the existence of an observational association cannot automatically be 
taken to imply a causal association as well, and this is the usual situation, other 
measures are required. An observational association may merely reflect the 
presence of some other, unknown common cause. It is sometimes suggested, for 
example, that cigarette smoke is not really a causal agent in the development 
of lung cancer but that some special constitutional make-up, perhaps genetic in 
origin, predisposes certain individuals to lung cancer and also makes them 
cigarette smokers. It is in the investigation of such possibilities that the ratio of 
incidence rates is most useful. Thus, cigarette smokers have been reported to 
have a ninefold greater risk of dying of lung cancer than nonsmokers. If this 
observational association is not causal but merely reflects the common effect of 
some unknown third characteristic, one can say that this third characteristic 
must be at least ninefold more prevalent among cigarette smokers than among non- 
smokers.*! If quantitative investigation shows that the relative prevalence 
(cigarette smokers to nonsmokers) of suspected common characteristics is less 
than ninefold, these characteristics cannot by themselves account for the 
observed association. On the other hand, if one is told that the difference in 
annual incidence or mortality rates between cigarette smokers and nonsmokers 
is 40 per 100,000, nothing about the difference in the prevalence of the postu- 
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lated third characteristic between cigarette smokers and nonsmokers can be 
inferred. 

Similarly, if a single agent is associated with two diseases, we may say that 
the association with the disease having the higher relative risk is less likely to 
be explained by a common third cause. Thus, the 70 per cent elevation in risk 
from coronary heart disease among cigarette smokers that has been reported 
could possibly be explained as the result of a common third characteristic whose 
prevalence among cigarette smokers is twice that among nonsmokers. It would 
be arithmetically impossible, however, for this same characteristic to explain 
the ninefold difference in lung cancer. 

A second useful purpose served by the relative, but not the absolute, mea- 
sure is in the refinement of classification. A priori considerations will not often 
indicate exactly what the classification by characteristic should be; neither 
will they indicate exactly how the disease should be defined. Thus, smokers may 
be defined as those who smoke either cigarettes, cigars, or pipes or as those who 
smoke only cigarettes. Lung cancer may be defined to include all histologic types 
or restricted to epidermoid carcinoma of the lung. There is a precise sense?! in 
which one can say that the best classification on both the characteristic and the 
disease axes is the one leading to the largest relative risk. Thus, the stronger 
association of cigarette smoking with epidermoid carcinoma of the lung than 
with adenocarcinoma, as reflected in a larger relative risk for the former, suggests 
that adenocarcinoma may not be related to smoking. Use of the differences rather 
than the ratios of incidence rates would not reveal this. 

Finally, the incidence ratio provides some indication of the importance of 
characteristics other than the one being studied. If the characteristic under study 
is only one of many independent characteristics associated with the disease, the 
ratio will be closer to unity than if this is not the case. The presence of many 
other possible causes, furthermore, indicates the necessity of exercising great 
caution in attributing causal significance to the observed association for any one 
characteristic. 

It has been suggested®:® that when other causes are present it is incorrect, or 
at least inappropriate, to measure effects using the ratio of observed incidence 
rates, since these are compounds of more fundamental rates. But the rates postu- 


lated in competitive risk models are themselves compounds of more fundamental 
constants, such as reaction velocities, which could in their turn be deduced from 


even more fundamental physical considerations. The process of expanding scien- 
tific understanding is a never-ending one, and all that can reasonably be asked 
of a descriptive or analytic method is that it contribute to this process. 


ELIMINATING THE EFFECTS OF OTHER VARIABLES 

The discussion up to this point has assumed that the measures of association 
can be computed without regard to the effects of other known or suspected vari- 
ables. This assumption will rarely be true, however, and methods of eliminating 
the possible effects of other variables must be considered. If extraneous variables 
are very highly correlated with the characteristic under study, or if they are 
too ill-defined to be measurable, of course little can be done to eliminate their 
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effects in observational studies. Two general methods are available for eliminating 
the effects of variables that do not fall in this category. The first involves match- 
ing disease and control cases with respect to the control variables. The second 
calls for the selection of independent, unmatched samples of cases of disease and 
controls, but with the effects of extraneous variables eliminated in the subsequent 
statistical analysis. It is possible to combine both procedures by matching on 
some variables (normally, those whose importance has been previously estab- 
lished) and analyzing for the effects of others. 

The most general procedure of analysis involves cross classification of each 
of the two samples with respect to the extraneous variables and separate (at 
least in principle) computation of the association between the characteristic of 
interest and disease status for each basic cell of the cross classification. Extensive 
cross classifications require an abundance of observational material. In such 
cases it is natural to group quantitative variables into broad classes. The loss 
attributable to overly coarse grouping does not appear to have been studied, 
although results of Cox” on a different but related problem indicate that as few 
as two classes of equal size will retain two-thirds of the information present, 
while four equal-size ones will retain 86 per cent. The analysis of covariance as 
an alternative to cross classification is sometimes employed,” although usually 
there are restrictive assumptions made as to the absence of interactions and the 


presence of only linear effects. Postmatching is a device by which the samples of 


cases of disease and controls, originally selected independently, are matched at 
the conclusion of field work; data for unmatched subjects in both groups are 
discarded. In addition to its obvious lack of efficiency, this method has the further 
disadvantage of making the study of interactions difficult. 

Prematching, although much used, is usually difficult to carry out in the 
field because controls which match on a series of variables are not easy to obtain. 
As in postmatching studies, if interactions among variables exist, the interpreta- 
tion of results may be equivocal. Furthermore, the effects of the matched variables 
themselves on the disease cannot be studied. 

The analysis of tables of multiple classification is too large a subject to enter 
into here. One aspect, however, the estimation of relative risk, requires mention. 
For each cell of a multiple classification one obtains an estimate of relative risk, 
say r. Some method of combining the estimates for the different cells is required. 
A combined estimate in the form of a weighted average seems reasonable, but 
different criteria in choosing weights can lead to different results. Mantel and 
Haenszel'® propose as a compromise weight for a cell 

po (1 — pi) 
1/ni+ 1/no 


where ~; and fp» are (as explained previously) the proportions with the charac- 
teristics in disease and control groups in that cell and n; and ms are total number 
of cases of disease and controls studied in that cell. This is an easy estimate to 


compute since it reduces, in the notation of Table I, to 


ad be 
Combined relative risk = > / =—. 
N N 
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It is particularly important to note that common methods of combining cells, 
such as pooling, or analogues of the direct or indirect method of age-adjustment™ 
vield results that need not be weighted averages of the individual relative risks 
and can, therefore, give an over-all relative risk which is entirely outside the 
range of the relative risks observed for each cell. As an extreme example of what 
could conceivably happen, Mantel and Haenszel give the following example. 
Consider two cells, in the first of which p;= 0.05 and p.= 0.01 and in the second 
of which p; = 0.99 and po= 0.95. Each cell thus gives a relative risk of approxi- 
mately 5. If equal numbers had been covered in each cell and the two cells were 
combined by pooling, however, the relative risk of the combined cell becomes not 
5, but 1.3. Although such an extreme result might not often be encountered in 
practice, methods of estimation that can lead to it cannot be generally recom- 


mended. 

The warning as to the potentially misleading nature of the estimate of rela- 
tive risk vielded by pooling applies to matched-sample studies as well. These 
may be considered as a special case of multiple classification with the number of 


cells studied equal to the number of pairs included. In such a study any pair 
can yield one of four possible results: (1) a disease case with the characteristic, 
control case without it, (2) disease case without, control case with, (3) both with, 
and (4) both without. Call the number of pairs of the first type a, of the second 
type 5, of the third type c, and of the fourth type d. Kraus,” in a consideration 
of this problem, recommended as an estimate of relative risk a/b. Interestingly 
enough, the weighted risk proposed by Mantel and Haenszel for the general 
multiple classification case reduces to a/b for the matched sample case, as does 
the maximum likelihood estimate of the relative risk on the assumption that all 
pairs have the same relative risk. The estimate which would result from pooling 


(a+ c) (a+ d) 
(6+ d) (b+ 0c), 


the data in all four categories, is different, however, and not 


easy to defend. 
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OUSEHOLD morbidity surveys may be used to gather data for a number 

of different purposes. Such surveys provide, for example, statistics per- 
tinent to the administration of health programs, the evaluation of present 
medical services, facilities, and personnel, the estimation of the demand for 
certain health-related products, and the investigation of etiological factors 
in illness.! Since this symposium is concerned with the more strictly medical 
aspects of illness rather than with its social ramifications, household surveys 
will here be considered primarily in terms of their contribution to conventional 
medical research. 

The scope of this discussion will be further limited to exclude the basic 
methodologic question as to how much of value to medicine can be learned 
through the use of an observational or epidemiologic approach as opposed to 
laboratory or experimental methods. That issue is fully discussed in the article 
by Lilienfeld in this symposium. It will here be simply taken for granted that 
valuable etiological cues are frequently to be gained from the study of the 
differential incidence or prevalence rates of a given disease in ordinary popula- 
tions, in spite of the problems of inference resulting from uncontrolled and 
nonrandomized variables the effects of which cannot readily be taken into 
account. 

Those problems in the interpretation of household survey data which are 
similar to those encountered in the interpretation of other types of observational 
data have been fully discussed elsewhere and will therefore be ignored in the 
present discussion.?~® 

The merits of morbidity interview surveys will be taken up first, followed 
by a discussion of their chief weakness, the problem of response validity. It will 
be noted that the benefits to be derived from the conduct of interview surveys 


are not unique to that particular technique of data collection. Similar advantages 
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are to be enjoyed from studies involving the clinical evaluation of random sam- 
ples of broad segments of the general population, but financial and certain logistic 
considerations act as a barrier to the exclusive reliance on such clinicoepidemio- 
logic investigations. Although a number of studies of this latter type,’-'! of both 
a cross-sectional and longitudinal character, have shown and are showing tre- 
mendous promise, they have not as yet completely superseded surveys based 
on /ay reports of illness. The evaluation which follows will be focused specifically 
on surveys where neither diagnostic examinations nor other than verbal mass- 
screening devices are employed in the determination of the health status of the 
individual. While the section on the strengths of household surveys can be con- 
sidered applicable to the clinicoepidemiologic as well as to the lay report type of 
inquiry, the issues raised in the section on validity are more peculiarly relevant 


to lay report surveys. 


MERITS OF INTERVIEW SURVEYS 


Breadth of Population Coverage.—It is a fundamental tenet of the epidemio- 
logic approach to disease that a multiplicity of variables has a bearing on the 
probability that a given individual will develop a given disease at a given time.°® 
According to this view of Nature, disease is more profitably to be conceived as 
the outcome of the znterplay between a vast array of forces than as the simple 
consequence of exposure to a limited number of exogenous agents.* The notion 
of the interplay of causative factors is the aspect of the epidemiologic viewpoint 
which is of greatest importance here.":'* The effects exerted by the different host, 
agent, and environmental variables cannot be viewed as being simply additive. 
Higher order interactions obviously are common. A particular constitutional 
state may be conducive to the development of a particular disease for individuals 
with certain activity patterns but not for individuals with other activity patterns. 
For instance, the usual association between overweight and excess mortality 
was recently found to be absent among longshoremen.'® This type of interaction 
among variables is probably the rule rather than the exception. 

If generalizations concerning the differential incidence of disease in relation 
to varying circumstances are to be useful, either for purposes of understanding 
or control, their limits of applicability must be highly specific. The relationship 
of given variables to morbidity must be examined under many different conditions 
of interaction with other variables. In other words, there are distinct advantages 
to the diversification of the investigator’s observational base.'*” It is, of course, 
desirable that there be statistical control over the heterogeneity—we should like 
to examine relationships between the variables of special interest under various 


specifiable sets of circumstances. Ideally, the correlations would be assessed within 


a number of subgroups differing widely from each other but internally homo- 


*Belief in the irreducible complexity of biologic phenomena need not lead one to view ‘‘observa- 
tional’ data as being of particular value. Ingle,” for instance, propounds a conception of the universe 
involving just as complex a system of causality as that held by epidemiologists but arrives at somewhat 
different conclusions as to the rewards to be expected from different research strategies. 
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geneous with respect to the variables being treated as contextual in the particular 
investigation. But even if this ideal cannot be attained and the population cannot 
be divided into relatively homogeneous subgroups, there may be advantages to the 
study of a population with the same level of heterogeneity as the population in 
which it is hoped to control the disease in question. 

What has been said here about the need for a broad observational base is 
indeed an epidemiologic platitude. At first glance, it would appear that the many 
recent comparative studies involving the contrast of illness rates among diverse 
population groups are quite adequate.'’-?° But such investigations at best involve 
correlations on the aggregate level. Mortality rates for groups are related to 
other averages for these groups. If the disease being studied is an infectious one, 
correlations on the aggregate level are much to be desired since the likelihood that 
an individual will contract a disease may well be a function of the living 
conditions in the broader community as well as the individual's own personal 
environment. 

In the case of the degenerative diseases, the situation may be somewhat 
different. The personal situation of the individual and his own life experiences 
are likely to be of chief interest. Thus, correlations on the aggregate level may 
leave much to be desired, since such relationships between group averages may 
lead to conclusions quite different from cross-sectional correlations—those on 
the level of individuals.”!:> Furthermore, when dealing with group characteristics 
it is difficult to take into account more than two or three variables at once. It 
is generally impossible to collect and collate data pertaining to groups sufficient 
in number and adequately representing the various potential combinations of 
characteristics to permit inferences with respect to more than a few variables 
at a time. Given the preceding conclusions concerning the interaction between 
factors in their influence on biologic phenomena, the necessity of dealing with 
variables seriatim rather than simultaneously constitutes a serious handicap. 
While this is not to deny the value of comparing incidence rates among popu- 
lation groups, it is obviously far more desirable to be able to characterize each 
individual in terms of a number of variables specific to him than to have to 


depend on the aggregate characteristics of one of the groupings to which he 


belongs. 
The household morbidity survey provides a technique for collecting data 


pertaining to a diversity of situations with the individual as the statistical unit. 
The total population in which the investigator is interested can be readily repre- 
sented and, if the sample is large enough, higher order interactions between 
factors can be examined. Thus, for instance, the relationship of a high-fat diet 
to the development of a given medical condition could be compared for indi- 
viduals engaged in differential degrees of physical activity coupled with varying 
degrees of emotional tension. Obviously the uncertainties of inference from 
nonexperimental data would still plague such an inquiry, but it is likely that 
more valuable etiological cues would be forthcoming than from studies based on 
homogeneous populations or from studies where the decisive variables are in 
essence confounded. Thus, perhaps the primary merit of the household survey 
approach is the broadening of the observational base it permits while enabling 
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one to conduct analyses with the individual rather than the group as the statistical 
unit.* 

Comparability of Numerator and Denominator Classifications —The most 
economic source of data concerning those afflicted with certain diseases is a 
clinical population. It is possible to gather a great deal of data concerning the 
relevant life situations, experiences, and practices of the persons with a disease, 
those appearing in the numerator of a rate. In order to interpret such data, it 
is necessary to know how many people were exposed to the risk of appearing in 
this numerator.”** For instance, if we know the number of heavy smokers in a 
given locale who developed lung cancer during a specified time period, we also 
need to know how many heavy smokers there were in this locale at that time, as 
well as, of course, the comparable figures for those who were not heavy smokers. 
It is crucial that the smoking practice classification be of equal validity among 
those who developed cancer and those who did not if utterly misleading results 
are to be avoided. Yet, if the data on smoking habits among. the afflicted are 
collected through one mechanism and the data concerning the general popula- 
tion are collected through some other, it is extremely difficult to ensure an 
adequate degree of comparability in the smoking classifications. It is thus an 
advantage of the household survey that classificatory data for the numerator 
and denominator groups are collected in an identical fashion and should there- 
by be of more nearly comparable validity than independently collected sets of 
gata." 

Quality of Sample.—By applying modern knowledge of sampling design and 
techniques of eliciting respondent cooperation, it is possible to conduct inter- 
views among an extremely high proportion (often well over 90 per cent) of the 
selected cases from an efficient sample representing a clearly demarcated, speci- 
fiable population. This constitutes a clear-cut advantage of the household survey 
over certain alternate techniques of data collection.®:?*:*6 
The problems of interpreting correlations derived from data bearing on 


subpopulations have been so fully treated by Berkson*?’ and others*.?* that 


they hardly require amplification here. Unless the probability that an individual 


is part of the subpopulation is completely independent of his position on the 
explanatory variable in both the disease and control groups, or unless the disease 
and control groups are mutually exclusive and the probability of appearing 


in the subpopulation is equal for members of the disease and control groups, the 


*Although epidemiology is frequently distinguished from clinical research on the basis of the former's 
reliance on intergroup differences in contrast with the latter’s concern with the individual case, the 
above discussion should not be taken as suggesting an affinity between the household survey and the 
clinical approach. While in both instances we are able to characterize each individual in terms of a 
number of properties specific to him or his own situation, the similarity ends there. All that is usually 
meant by the epidemiologic-clinical distinction is that epidemiology is probabilistic or statistical while 
clinical investigation can be characterized as ultimately seeking universals, even though the immediate 
concern may be the thorough description or explanation of the individual case. The contrast made in 
this paper between the use of the individual as against the use of the group as the statistical unit has 
nothing to do with epidemiologic-clinical distinction. Here, only the epidemiologic approach is being 
considered. Either the characteristics of a number of individuals or the averaged characteristics of the 
members of a number of groups can be viewed statistically, but far more powerful and flexible analyses 
are feasible when the data are on the level of the individual. 
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relationship observed in the subpopulation may very well differ from the one 
which would have been observed in the ‘‘total’’ population. 

It must be admitted that an element of arbitrariness enters into what we 
consider to be the total relevant population and what we consider to be a sub- 
population. Conceivably, someone might insist that the relevant population 
consists of all humans who have ever lived or will ever live, although such a 
choice perhaps carries the notions of higher order interaction or the contingency 
of generalizations to a ridiculous extreme. In any event, the household survey 
provides a mechanism for collecting data representative of populations of greater 
breadth than those which can generally be studied through existent records of 
the providers of care or of public and private agencies. Actually, the need to 
base investigations on cases representative of the total relevant population is 
as great in experimental as in epidemiologic research, but this need usually comes 
far closer to fulfillment in the latter area of research than in the former. 


Far higher sample completion rates have generally been attained in connec- 
tion with interview surveys than in connection with surveys involving more 
formal diagnostic procedures.**-** Since samples depleted through the refusal 
of cooperation by substantial segments are subject to the same sorts of bias 
mentioned above in connection with the use of subpopulations to represent larger 
populations, the higher completion rate of interview surveys could be viewed 


as giving them a marked advantage over diagnostic surveys. But, the success 
of several recent pretests in the health examination program of the National 
Health Survey as well as the success of Cochrane in England suggest that the 
sample completion rate for diagnostic surveys can possibly be raised to an 
adequate level.’ *4 

Breadth of Classifying Data.—Through the use of an interview survey, it is 
feasible to collect an extremely wide array of data concerning factors of possible 
etiological significance for disease. Information concerning the social, psychologic, 
and economic aspects of each individual's life situation as well as his past ex- 
periences and family medical history can be determined with varying degrees of 
validity through this approach and can be related to the incidence or prevalence 
of disease.?*:*-37 Admittedly, there are a great many problems of inference owing 
to the usually retrospective character of such investigations, the dubious validity 
of certain of the measurements, and related factors. Nevertheless, the extremely 
broad range of factors which can be explored through the interview survey 
make it a particularly suitable instrument for sorting out the multitude of 
alternate etiological hypotheses which abound during the early stages of inquiry 
into those which seem to warrant more intensive follow-up by other methods 
and those which do not seem to be particularly promising. 


INACCURACY OF DIAGNOSES DERIVED FROM INTERVIEW SURVEYS 


What is the ‘True Diagnosis’’?—A distinction is frequently made between 
the subjective and objective scales of illness.**.*4.*5.3° This draws attention to the 
fact that there is by no means a perfect correlation between the sensations of 
pain or discomfort experienced by individuals and the tissue damage, anatomic 
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defects, abnormality of body chemistry, etc., which could be discovered by 
autopsy or the most revealing laboratory tests known. While medicine, in practice, 
takes into account both the organic symptoms and the subjective experiences 
of the patient, at our current stage of knowledge we are prone to give precedence 
to the more objective aspects of disease. Whatever useful generalizations are to 
be established at present on the basis of the observed regularities in the natural 
history of a disease are likely to be tied more closely to the organic processes 
than to intrapsychic processes, insofar as these two aspects of existence diverge. 
Even while eschewing an unfashionable mind-body dualism, it is reasonable to 
take the results of autopsy and the laboratory as the standard of diagnostic 
accuracy for an epidemiologic investigation. This is in no sense to imply that 
information bearing on the subjective component of illness will in the end turn 
out to be any less valuable to society than knowledge of organic processes! :*4:?° 
it is simply that the functions of the two types of knowledge are different and 
we are here concerned with medical research rather than social planning. 

It may appear obvious enough that autopsy and laboratory findings con- 
stitute the ultimate criteria for assessing the adequacy of lay reports of illness 
for medical purposes, yet it should be recognized that both what is noted by the 


pathologist and what he makes of his perceptions are largely a function of the state 


of medical thinking at the time. The medical classification of complex organic 


0 Unquestion- 


processes into specific disease categories is completely arbitrary. 
ably, the conceptual apparatus of differential diagnosis is extremely functional 
and probably even essential to the most effective treatment of illness and to the 
discovery or contrivance of improved therapeutic measures. Still, it must be 
realized that what is considered a disease during one epoch is considered a symp- 
tom in another and that the boundaries between diseases are in continuous 
flux. In fact, whether or not a given organic condition is generally considered 
as pathologic depends substantially on the time and place at which the diagnosis 
is being made.**-” 

There is nevertheless little choice but to treat the conceptualization current 
at the present stage of medical development as inviolable. Unfortunately, the 
acceptance of current doctrine does not completely solve the problem of estab- 
lishing a satisfactory standard for diagnoses, since the findings of the most 
comprehensive clinical examination that could currently be devised would still 
be subject to considerable error as compared to the findings of an autopsy. In 
this connection, Breslow?® has warned against the uncritical acceptance of clinical 
assessments because of the inaccessibility of many organic developments to the 
clinician. While Breslow has stressed the cyclic character of the manifestations 
of many diseases and the consequent unreliability of the single clinical examina- 
tion, it has long been assumed and in fact has been recently demonstrated that 
there is considerable variability among the clinical observations or diagnoses 
made at a given time by different observers.*!:**-* Nevertheless, in the absence 
of a more definitive criterion, findings by clinical examination will here be ac- 
cepted as of unquestionable validity. 

Since the degree to which lay reports of illness correspond with clinical diag- 
noses is the most important variable in the assessment of the medical usefulness 
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of the interview survey, this question will here be treated in far greater detail 
than were the merits of the technique in the preceding sections. This imbalance 
is also justified by the absence from the literature of a systematic evaluation of 
the various studies dealing with the validity of lay reports in contrast to the 
plethora of discussions dealing with the merits of surveys. 

Although it was in the past recognized that interviews with lay respond- 
ents result in lower estimates of the prevalence of many conditions than would 
thorough physical examinations,*® it is only recently that the extent of this 
discrepancy has been determined with any precision. On the basis of these recent 
studies, it must be concluded that the underreporting is of a much greater magni- 
tude than had previously been suspected. The two household surveys??*!:4.47 
of the Commission on Chronic Illness were extremely well-designed and well- 
executed endeavors. Unusually detailed interview schedules were used, and the 
enumerators were as well trained and competent as any who would be likely to 
be employed on any future survey of this type. Yet in each study it was possible 
to find, among the conditions and symptoms reported in household interviews, 
an even approximate match only for less than one-fourth of the clinically diag- 
nosed cases of chronic conditions. In other words, there was scarcely a hint in 
the interview material of the existence of three-fourths of the clinically deter- 
mined cases of chronic illness. Furthermore, the results are only slightly better 
if the test is restricted to conditions rated as disabling by the examining clinician 
or to conditions sufficiently symptomatic that they ‘‘could have been reported.”’ 

It should be understood that there is a large element of arbitrariness in the 


‘ 


particular ‘‘adequacy rate’’*® which emanated from the two studies. By manipu- 
lating various facets of the design and conduct of these tests of the household 
interview, it would have been possible to have either raised or lowered the degree 
of correspondence between the interview and clinical data. For instance, im- 
provements in the interview schedule might have led to a somewhat higher 
level of agreement, while the employment of more stringent standards in match- 
ing diagnoses from the two sources might have reduced the level of agreement. 
A number of such incidental factors will be considered here to provide grounds 
for assessing how broadly applicable the findings of these particular tests are. 

Single-Visit Versus Pertodic-Visit Surveys.—Single-visit, rather than peri- 
odic-visit, interview surveys were employed in both tests. It seems likely that 
owing to the cyclic character of at least the more obvious manifestations of many 
diseases, a periodic-visit survey will usually (all other things being equal) uncover 
a greater volume of cases of chronic illness than a single-visit survey. A condition 
which is quiescent during the initial wave of interviewing and which the informant 
may therefore forget to report, or which he may incorrectly consider to be cured, 
may be in a more active stage at the time of some later wave of interviewing. 
Or, even aside from conditions of fluctuating intensity, repeated interviewing 
could still elicit additional reports of conditions already prevalent at the time of 
the initial interview simply through the supplementary prodding of the infor- 
mant’s memory. A condition which was forgotten at first may be recalled at a 
later time, simply because the informant may happen to have fewer other ill- 


nesses to report at the later time or because some recent occurrence may have 
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happened to have brought it to mind. In assessing the prevalence of a chronic 
condition as of a given time, reports made in interviews conducted both prior and 
subsequent to the time in question could be combined to minimize the recall 
problem. Thus, there are sound a priori grounds for assuming that a periodic-visit 
survey might have produced results more nearly comparable to those of the 
clinical evaluation than did the single-visit Commission on Chronic IIIness 
surveys. 

There has been no definitive test of this assumption, but the most relevant 
existing evidence at least raises some doubts about the issue. The single-visit 
Commission on Chronic Illness surveys produced considerably higher prevalence 


rates for almost all comparable diagnostic categories than did any of a number 
of prior periodic-visit surveys.* This result is almost unquestionably due to the 


superiority of the interview schedules employed in the Commission's surveys. 
The Baltimore and Hunterdon County interview schedules involved a far greater 
number, and probably more effective, aids to recall than did the instruments 
used in the earlier studies. In addition, the lay public may possibly have ex- 
perienced, during the past few decades, a general increase in medical sophis- 
tication which served to increase the volume of illness reported. Therefore the 
future use of a highly detailed interview schedule in a periodic-visit survey might 
well lead to somewhat more complete reporting than was the case in the two 
Commission on Chronic Illness surveys. Still, there is absolutely no basis for 
even hoping that the improvement will be of sufficient magnitude to solve the 
problem.t 

Self-Respondents Versus Household Informants.—Another reason why the 
Commission surveys resulted in less than a maximum proportion of the clinically 
determined conditions being reported was the use of a household informant 
rather than interviewing each individual about his health. In both the Hunterdon 
and Baltimore*!:*? studies, it was found that a somewhat larger proportion of the 
illnesses subsequently found in the clinical evaluation had been reported in the 
interview material of individuals reporting for themselves than of individuals 
for whom someone else had reported. Unfortunately, these two categories of 
cases were not randomly selected, so that factors other than self-reporting 
versus reporting by others may have affected the size of the difference. Still, 


the hypothesis that an individual reporting for himself will report more illness 


*See page 498 of reference 31 

tIn one methodologic experiment, there was actually found to be a tendency for individuals par- 
ticipating in a periodic-visit survey to report a decreasing amount of illness in response to succeeding 
waves of interviewing.*® Since control groups showed higher levels of reported morbidity at the later 
time periods than did the experimental group, the diminution in reports of illness was probably due to 
the repeated interviewing rather than seasonal or similar extraneous factors. Perhaps respondents failed 
to reiterate their reports of chronic conditions on later waves because of a desire to avoid being repetitious 
or perhaps tedium set in and respondents just did not try as hard on later waves to dredge up conditions 
as they had on earlier waves. But whatever the cause of the decrease, it does not vitally affect the 
conclusions drawn here. In the experiment, the questionnaire responses for each month were tabulated 
separately rather than cumulatively as is here suggested. Thus, if all the different chronic conditions 
reported on succeeding waves were combined, there would have been an increase in the number of 
conditions reported as compared to a single wave of interviewing. Furthermore, one could supply the 
enumerator with a list of the conditions reported for each household on the preceding waves of inter- 
viewing and have the enumerator check on each condition to determine its present status. 
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than will some other member of the household reporting for him is well supported 
by other research. In the Pittsburgh Arsenal Health District study, the differ- 
ences in the volume of illness reported by self-respondents and other respondents 
were substantial.6? From the San Jose pretest? and postenumeration survey 
phase of the California Health Survey,‘* there is also rather conclusive evidence 
that other respondents fail to mention conditions that a self-respondent would 
very likely have mentioned. Still, in none of these studies was there a random 
determination of whether or not an individual was to be interviewed about him- 
self. 

In the Pittsburgh arthritis®*! and heart disease studies, the proportions of 
false negatives were substantially less when individuals were interviewed about 
their own health than when the household informant approach was followed. 
Even these findings are somewhat equivocal, though, because a number of 
probably significant changes in the interview situation were confounded with 
the shift from household interview to individual interview.*! For instance, there 
was a modification of the wording of the questions in the individual interview 
which undoubtedly served to reduce the proportion of false negatives even 
though it may well have increased the proportion of false positives for heart 
disease to some extent. The individual interviews dealt, essentially, with a single 
disease while the household interview was of an omnibus character—conceivably 
there is greater likelihood of the admission of affliction with a given condition if 
the interview is directed toward that condition and distracting questions con- 
cerning other conditions are absent. In addition, the individual interview was 
composed only of questions pertaining to one person, the respondent, while the 
household interview generally concerned several individuals. Interviewers for 
the two waves were recruited from different sources and had different degrees of 


experience and training in morbidity survey work. Since all these factors could 


have contributed to the decrease in the proportion of false negatives, it is not 
clear exactly how much of the improvement can reasonably be attributed to the 
change due to self-reporting versus reporting by others, although it seems likely 
that this latter factor did have some positive effect. The concomitant increase of 
false positives in the heart disease inquiry is unfortunate since the distortion 
introduced through an increase in the rate of false positives for a relatively 
infrequent condition is likely to be greater than the reduction in distortion due 
to an arithmetically equal decrease in the rate of false negatives.” 

While all the aforementioned studies of the informant factor suffered from 
the absence of randomization and/or confounding, three recent studies met 
these particular criteria of experimental design more adequately. Even so, rather 
divergent findings emerged from them. In a North Dakota study® there was 
practically no evidence of underreporting by other respondents as compared to 
self-respondents; yet in similarly designed studies conducted in a British housing 
project®® and in Charlotte, North Carolina,®” appreciable differences were found. 
One possible explanation of the divergence of the North Dakota results from the 
British, Charlotte, and practically all previous American findings is the fact 
that the North Dakota study was not a household survey in the usual sense. 
Information was collected only about males 35 years of age and older rather 
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than about every member of the family as in usual morbidity surveys. Thus, the 
housewife did not have to keep in mind her own afflictions and those of all the 
other members of her family; she needed only to answer about her husband. 
It has been suggested*!°® that the characteristically observed tendency of house- 
wives to report less illness for their husbands than the husbands report for them- 
selves is not due solely to a lack of empathy for or awareness of the husband’s 
ailments. A housewife having to report for her entire family may become confused, 
inattentive, and forgetful and not only report less illness for her husband than 
he would for himself but less illness for both herself and for her husband than 
she would if she were being asked only about either one of them. In fact, in two 
studies where individuals who had acted as household informants were later 
reinterviewed as respondents for themselves only, the second interview resulted 
in a substantial increase in the reported prevalence of illness.**°? In at least one 
these instances, the higher reported incidence was accompanied by the expected 
decrease in false negatives.” Thus, the absence of bias in the North Dakota study 
may have resulted from the fact that generally the inquiry was concerned with 
the health of only one person. In addition, the North Dakota morbidity questions 
were, owing to the study’s limited objectives, somewhat more definite and highly 
structured and were less likely to elicit equivocal responses than is usually the 
cases in an omnibus morbidity survey.°’ It is clear that the degree to which 
household survey results are improved by reliance on self-respondents in place 
of household respondents is a function of the breadth and focus of the inquiry, 
the structure of the interview schedule, the competence of the enumerator, and 
a host of other factors. Conceivably, informants are about as adequate as self- 


respondents for interviews of extremely narrow scope dealing with highly mani- 
fest and unequivocal phenomena, but the difference is probably rather substantial 


for purposes of the conventional omnibus household survey. Of course, whether 
the increase in the volume of reports resulting from primary reliance on self- 
respondents is worth the greatly increased costs of such a procedure is an issue 
which has to be resolved on the basis of the objectives and the resources available 
for any given study. 

In any event, the Hunterdon, Baltimore, and Pittsburgh heart disease results 
demonstrate that even individuals responding for themselves fail to report the 
vast majority of the cases of many types of clinically diagnosed conditions. Thus, 
it is extremely unlikely that improvements along this line in the conduct of a 
morbidity survey would provide an adequate solution to the problem of under- 
reporting. 

Other modifications in the interview survey procedures would undoubtedly 
also tend to reduce to some extent the prevalence of false negatives. For instance, 
the structure of the interview schedule and the wording of the questions have been 
shown to have a marked influence on the volume of reported illness.#*:49°!:97,99.6° 
The use of special morbidity diaries or other auxiliary recall devices, when coupled 
with an interview, may also increase the validity of responses.®':®:” One of the 
most important survey factors is, of course, the competence and training of the in- 
terviewing staff.°"®-® But, certain decisions on the conceptual plane probably have 
an even greater impact than these variables of survey design and execution on 
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the degree of correspondence to be found between interviews and clinical exami- 
nations. Some of these conceptual issues are: How refined do the disease 
categories need to be? How closely do the survey reports need to approximate 
the clinical diagnoses to be considered as matching? How lenient or stringent 
are the diagnostic criteria of each specific condition to be? How extreme must 
an abnormality be before it is considered as a chronic pathologic condition? 

Specificity of Diagnostic Categories.—While one could conceivably employ 
disease categories ranging in specificity from a single global health rating per 
individual to a seven- or eight-digit Standard Nomenclature code or a three- 
or four-digit International Statistical Classification code, rather specific diagnostic 
categories are almost by definition necessary for the purposes of medical research. 
Clearly, if regularity in the etiology or course of any set of medical phenomena is 
to be identified, such regularity is likely to pertain to a rather specific condition. 
Differentiation between diagnostic categories emerges in such a manner as gen- 
erally to maximize suspected medical regularity within categories and so as to 
distinguish among conditions which apparently tend to exhibit diverse natural 
histories. Given this fact, the degree of specificity employed in the Hunterdon 
and Baltimore studies is certainly not too severe. Had a broader set of categories 
or a categorization using some other system of classification been employed in 
the Commission on Chronic Illness studies, the medical relevance of these tests 
of the survey method would have been greatly diminished. Thus, the results of 
the tests are certainly not to be rejected on the grounds of excessive specificity 
of the criterion diagnoses. 

Still, it might be asked whether the interview survey constitutes an adequate 
technique when what ts desired is a global assessment of the individual’s health 
status rather than information concerning the presence or absence of particular 
diseases. To what degree can gross, holistic judgments of physicians be approxi- 
mated on the basis of survey interview reports? The Commission studies throw 
practically no light on this subject because the unit of analysis employed was the 
chronic condition case rather than the individual.®* It is not known whether 
cases of clinically identified, seriously disabling conditions which had not been 
reported in the interview tended to occur primarily among individuals for whom 
some other conditions had been reported in the interview or among individuals 
who had been reported as being completely free of illness. Obviously, it would 
make quite a difference with respect to the ‘‘medical validity”’ of global judgments 
based on survey reports whether most unreported cases of illness tended to be 
found among individuals afflicted with a rather large number of different con- 
ditions or whether these unreported cases tended to be spread rather evenly 
throughout the entire sample. The fact that appreciable prevalence rates of serious 
chronic conditions were found among individuals in those clinical evaluation 
strata characterized as healthy on the basis of the interview data suggests a 
a fair amount of divergence between clinical findings and self-conceived health 
status. Still, in the absence of analyses in which the individual rather than the 
case is used as the tabulating unit, the degree of divergence is not at all certain. 

The two studies in which the findings of a clinical examination were com- 


pared to gross ratings derived from interview data resulted in diametrically 
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opposed conclusions. From a study of Michigan farm families,®’ the conclusion 
‘symptoms 


‘ 


was reached that, on the basis of a survey interview employing the 
approach,”’ it is possible to approximate with sufficient accuracy a physician's 
judgment as to whether or not a given individual is in need of medical care. From 
a study of retirement®® it was concluded that even on a global level the findings 
from medical examinations and from self-administered questionnaires were so 
little in accord that the survey responses could in no sense be considered as a 
surrogate for a medical examination. These contradictory conclusions are of 
course in part caused by differences in the objectives of the two inquiries and, 
consequently, by differences in the standards of what constitutes an adequate 
degree of agreement between the two methods of assessment. In addition, the 
Michigan approach was indirect in the sense that individuals were not simply 
asked whether they thought they were in need of medical care—a screening device 
was emploved to arrive at that judgment. Thus, a given individual may be 
classified as needing medical attention both on the basis of a clinical examination 
and on his responses to the screening device, and vet utterly different ailments 
could have been responsible for the two positive classifications. Such fortuitous 
medical validity is pragmatically satisfactory. Even so, the agreement between 
the classifications based on the symptoms list and those based on the medical 
examination was far from perfect. At present, the bulk of the research evidence 
can be interpreted as indicating that a clinical assessment of general health and 
the responses to survey questions about health are two only slightly correlated 


phenomena, but, conceivably, improvements in the survey instrument could 


lead to some increase in that correlation. 

It should be noted that in inquiries operating on the level of gross ratings 
of health, medical validity may not always be necessary. Since an individual's 
behavior in regard to his health is held to be at least in part determined by his 
the discrepancy 


x 


own judgment of it rather than by more objective criteria,® 
between clinical findings and the judgments expressed in an interview need not 
invariably constitute a serious problem. This is particularly true if, as appears 
likely, the agreement between survey responses and the evaluation of the re- 
spondent’s health by his own regular doctor is at least somewhat greater than 
the agreement between the survey responses and an evaluation made by a clinician 
on the basis of a single contact with the patient. Of course, holistic ratings of 
health are of little value, in strictly medical research. 

Stringency of Agreement Standards.—With respect to the degree of simi- 
larity required between the survey interview response and the clinical diagnosis 
for them to be considered matching, the standards emploved in the Commission 
studies were about as lenient as they could reasonably be made to be. Both the 
essentially automatic ‘‘code comparison’? and the more subjective ‘‘medical 
judgment’’ methods were used to ensure a maximum degree of matching.*!*% 
Survey reports which were quite vague or misleading were accepted as matching 
a clinical diagnosis whenever there was reason to believe that the respondent 
had even remotely referred to a manifestation of that condition. In many cases 
it was extremely unlikely that, considering the clinical diagnosis as the criterion, 
the condition could have been even approximately properly classified on the 
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basis of the interview material alone. In other words, many of the interview 
reports rated as matching clinical diagnoses to a low degree would undoubtedly 
be epidemiologically useless. Thus, the Commission findings can hardly be dis- 
carded on the grounds of having employed overly stringent standards in matching 
data. 

Standards of Diagnostic Validity.—-In the course of a number of household 
morbidity surveys, the diagnoses of the attended conditions mentioned in the 
interviews were submitted to the attending physicians for verification.29 ®-7* 
In several of these studies the corrected diagnoses elicited from the physicians 
were actually substituted for the interview data in the tabulations,®:7°:”2 while 
in the others the medical verification was conducted for the purpose of determin- 
ing the degree of confidence to be placed in the lay reports. In those studies where 
the outcome of the comparisons was tabulated and reported,?’:7!:7* a reasonably 
high degree of correspondence was found between the reports of a household 
informant and the reports of the physicians attending the various members of 
the family—at least one would hardly suspect, on the basis of these physician 
checks, that only about one-fourth of the conditions to be found in a clinical 
examination were reported in the interviews. While the diagnoses submitted by 
the attending physicians tended to be more detailed and specific than the lay 
reports, the two reports would have been classified in the same general disease 
category in a vast majority of the cases. Even when physicians were requested 
to submit diagnoses of a patient’s condition without being told what had been 
reported for that patient in the interview, the level of agreement between inter- 
view and physician reports has been considerably higher than between interview 
and clinical evaluation findings.?’:*' Results similarly favorable to interview 
surveys have come from record checks**:! and comparisons on the aggregate 
level with official statistics.” 

One possible explanation of the divergence between the results of the com- 


parison with the report of the regular physician and the comparison with the 
clinical evaluation findings is that the clinicians were overly zealous in their 


diagnoses. In this connection, the following observations were made at a con- 
ference on morbidity interview surveys.’® 


“Tn clinical evaluation studies a factor about which little is known is the manner in which 
the doctor reacts to the situation in which he is asked to examine a cross-section sample of in- 
dividuals, including both sick and well. Does he tend to classify as significant conditions which 
he might pass over if the individual came to him with a complaint? The numbers of chronic 
conditions picked up in such studies is certainly surprisingly high. 

“When a doctor makes a diagnosis of a chronic disease on the basis of information obtained 
in a clinical evaluation before troublesome symptoms and disability have begun to appear, he is 
making a prediction that a certain course of events will follow. We need to study the accuracy 
of such predictions in longitudinal studies to learn how significant some of the conditions picked 
up in clinical evaluation actually are. 

“Follow-up studies of mortality were suggested as a possible means of measuring the accuracy 


of the physician's prediction in making a diagnosis after a clinical examination.” 


While independent reviews of the medical records by several qualified clin- 
icians would be required for a definitive determination of the extent to which 
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overdiagnosis distorted the Baltimore and Hunterdon County findings, there is, 
nevertheless, sufficient evidence available to enable us to make at least a tentative 
judgement of this problem. 

The investigators were clearly aware of the danger of an unreasonably low 
diagnostic threshold and in consequence adopted various safeguards. A sub- 
stantial number of minor ailments which had been originally diagnosed were 
later discarded in the course of medical editing in the Baltimore study. Also in 
Baltimore, two cardiologists acted as a buffer against spurious diagnoses through 
a critical review of the medical records of ‘‘evaluees’’ suspected by the original 
examining physician of having heart disease.*! 

In addition to this control at the data collection stage, certain analytic 
precautions also served as protection against the possible consequences of over- 
diagnosis. Conditions of a minor character, in terms of several different criteria, 
were excluded from the tabulations. But, even when only conditions which were 
disabling, were chronic, ‘could have been reported,’ required medical super- 
vision, or were rated as severe were considered, matches could be found among 
the interview-reported conditions for no more than one-third of the clinically 
diagnosed cases. Furthermore, confidence in the meaningfulness of these results 


is increased by the fact that corresponding levels of underreporting were found 


in three studies dealing with single disease groups.°)°?)™ 

The aforementioned gap between the amount of illness reported by the 
regular attending physicians and the amount of illness found in the clinical 
evaluation leads one to suspect that at least part of the underreporting by the 
lay informants was due to the failure of their regular physicians to note or prop- 
erly interpret certain abnormalities. It is certainly well established that a great 
many cases of illness remain undetected until their fairly late stages, notwith- 
standing the fact that they could have been discovered earlier through a thorough 
clinical examination or at times merely through a series of screening tests.7*7° 
In faet, given the amount of clinically diagnosable illness apparently unknown 
to the public’s regular physicians, the chances are extremely slim that a marked 
increase in the adequacy of interview morbidity data could result from improve- 
ments in survey techniques. 

\lthough in various recent discussions of interview morbidity data some 
consideration has been given to the relative suitability of alternative standards 
of valid measurement, there seems to have been some confusion on this subject 
in the past. The relatively high rate of verification by attending physicians 
of diagnoses reported in interviews led to the rather widespread acceptance of 
morbidity survey data. While these data may have in fact been suitable for the 
prosecution of certain sociomedical or public health objectives, they were gen- 
erally of inadequate validity for epidemiologic purposes. We might clarify this 
problem by distinguishing five intersecting sets of diagnoses which might be 
considered applicable to a given individual during a certain time period: 

(1) Diagnoses which would be made on the basis of an autopsy or explora- 
tory operation. 

(2) Diagnoses which would be made on the basis of an extremely thorough 


examination in a clinical setting. 
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(3) Diagnoses which would be made by the individual’s regular physician 
in the course of routine medical care. 

(4) Diagnoses believed by the patient to apply to himself. 

(5) Diagnoses reported in an interview with the patient or a member of his 
family.* 

As was pointed out earlier in this discussion, there is usually a certain amount 
of overlapping between the diagnoses belonging to any two of the above sets 
and each set is likely to contain some conditions not appearing elsewhere. It is 
reasonable to assume that the degree of correspondence between sets of diagnoses 
occupying adjacent ranks in the previous listing is greater than between sets 
of more disparate rank. Even the diagnoses resulting from a clinical evaluation 
are somewhat divergent from those based on autopsy, but the clinical diagnoses 
can at least be considered as a reasonable approximation of the criterion. The 
diagnoses made by the regular physician in his normal contacts with the patient 
are considerably further removed from what would be found by autopsy. Yet 
the physician does not even fully inform the patient about what he does think 
is wrong, while the patient, in turn, remembers and understands only part of 
what is told to him by the physician. Finally, not all conditions of which the 
patient and his family are aware are actually reported in the interview. It is 
thus hardly surprising that references to so small a proportion of clinically estab- 
lished conditions are elicited through morbidity interviews. 


Consequence of Severe Underreporting.Since the goal of the household 


survey is the estimation of rates in a population rather than the diagnosis of 


the conditions afflicting any particular individual, it might appear that the types 


*One could also differentiate perceived abnormalities from diagnoses. During a given time period, 
an organism can be characterized either in terms of discrete abnormalities of structure and/or function 
or in terms of explanatory categories (diagnoses) which serve to account for clusters of symptoms as 
well as to anticipate their future course. Just as we have defined five sets of diagnoses, we can also define 
five intersecting sets of symptoms—obviously the pathologists, clinicans, independent practitioners, and 
laymen are each so equipped and situated that they are likely to note some abnormalities which are 
unlikely to come to the attention of the others. In addition, the conceptualization which is involved 
in even the most naive cognition of discrete pathologic conditions tends further to differentiate the 
perceptions of the several types of observers. Thus the extent of agreement between observers on the 
symptomatic level would probably not be much greater than the extent of agreement on the diagnostic 
level, except insofar as the sensations experienced and reported by the patient are accepted as data by the 
medical observers. 

While laymen probably tend to think in terms of the discrete abnormalities to a greater extent 
than do the professionals, whenever symptoms are not understood (a diagnosis has not been made) the 
professionals also must revert to a rather primitive conceptual level, although a more sophisticated 
one than that of the average layman. Thus, an organism is in practice characterized in terms of a mixture 
of symptoms, diagnoses of disease conditions, and intermediate level concepts regardless of the observer's 
status. The text above deals only with diagnoses for the sake of simplicity. 

Another oversimplification in the text is the implicit assumption that different observers of a given 
type, i.e., different pathologists, different clinicians, or different independent practitioners, would arrive 
at the same diagnosis if they observed the same patient during a given short time period. A more ac- 
curate model would involve the depiction of each diagnosis in each set as a random variable. In other 
words, it would be more accurate to say that a particular individual during a specific time period has a 
certain probability that a rheumatologist who is examining him will classify him as having rheumatoid 
arthritis rather than to say that a clinician would invariably so classify him. Similarly, the patient 
himself at one instant may diagnose his ailment as one condition and at another instant as some com- 
pletely different condition, even during an extremely short time period. But such conceptual refinements 
are unnecessary for the purposes of the present discussion, so the certainty model presented in the text 


will suffice. 
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of inaccuracies revealed by the clinical evaluation studies are not an insurmount- 
able problem. One might think that it makes little difference if the conditions of 
a large proportion of the population are misclassified as long as the aggregate 
results are the same. Even though in past surveys there has been, in terms of 
the clinical findings, far more underreporting than overreporting of disease, 
one could, perhaps, design an interview schedule which would produce a balance 
in the two types of error when administered to at least certain populations. 
But such a notion is obviously absurd, since if enough were known to be able 
to create such an instrument and to designate where and when it could be prop- 
erly used, there would be no need for it. 

Actually, the absolute level of an incidence or prevalence rate is rarely 
important from the strictly medical point of view. What is usually desired from 
surveys for epidemiologic purposes are estimates of the differences in the incidence 
rates of particular conditions among various classes of individuals or over time. 
It has been suggested that the derivation of estimates of such differences from 
household survey data will generally result merely in attenuation, 1.e., the 


expected value of the difference between two population subgroups in the inci- 


dence of a given disease will be diminished by using survey data rather than 
clinical data.2®*! If this were generally true, survey data could be viewed as a 
rather ineffecient but nevertheless permissible substitute for clinical data, par- 
ticularly when one is only interested in discovering quite large differences be- 
tween subgroups and when a much larger and more complete survey than clinical 
sample is feasible. A positive association between the survey responses and the 
clinical diagnoses would, of course, be necessary, but this condition would cer- 
tainly almost always be met. But, in addition, a considerably more dubious 
assumption underlies the biomedical use of survey data. The rates of false nega- 
tives within the various population subgroups being compared must be identical. 
Phis condition of equality must also hold among the rates of false positives. In 
other words, if one were interested in comparing the incidence of a given disease 
among individuals exhibiting considerable anxiety and among individuals not 
exhibiting considerable anxiety, the rate of false positives in the two groups 
would have to be approximately equal, and the rate of false negatives in the 
two groups would also have to be approximately equal. 

While no data are available pertaining to the medical validity of survey 
responses of the type required for the estimation of imcidence rates, the afore- 
mentioned evaluations of survey prevalence reports are suggestive. There is 
evidence from both of the Commission studies and the Pittsburgh studies to 
the effect that the rate of false negatives varies considerably for many conditions 
among demographically characterized subgroups.*!°*! This is probably also 
true of false positives.” It can also be assumed that intergroup variability in 
accuracy of reports is at least as severe a problem with respect to incidence data 
as with respect to prevalence data. Thus, the problem of using survey data as 
a substitute for clinical data in the determination of differences in incidence 
rates is more than a mere problem of attenuation. Attenuation means that we 
will tend to underestimate the size of intergroup differences and will, in fact, 
sometimes conclude that there are no differences when in reality there are. The 
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false inference of the absence of a difference can be made rather infrequent by 
using large enough samples (although nothing can be done about the under- 
estimation of the size of the difference). But the likelihood of errors resulting 
from a correlation between the ‘‘probability of being misclassified on the basis 
of survey data’’ and relevant characteristics of the individual cannot be reduced 
by increasing the size of the sample. The ‘‘true difference”’ in incidence can tend 
to be either overestimated or underestimated in this case, and, in fact, incorrect 
inferences as to even the direction of the difference can become quite likely, 
depending on the correlation between accuracy and individual characteristics. 
This danger resulting from the use of morbidity data of dubious medical validity 
has been clearly pointed out in the past***; it would appear to be worthy of 
further stress. 

The danger of false inference as to the existence of differences in incidence 
or as to their size or direction would appear to be most severe when the sub- 
groups being compared are distinguished in terms of psychologic variables.°®° 
For instance, if, as in the example cited earlier, it is desired to measure the asso- 
ciation between anxiety level and the development of some particular disease, 
reliance on survey data seems especially temerarious. Ignoring the problem of 
measuring anxiety, it seems quite likely that the more anxious the individual, 
the more likely he is to become a false positive and the less likely he is to become 
a false negative. This problem would be similarly acute with respect to most 
other psychologic variables and many social variables as well. 

What has been said about the use of survey data as a basis for inferences 
concerning differences in incidence rates is equally applicable to prevalence rates; 
however, prevalence differences, in addition, present many other problems in 
an aggravated form. For epidemiologic purposes, it is necessary to determine 
the degree to which a difference in prevalence is due to a difference in incidence 
as against differences in the death rate or the remission rate among those afflicted 
with the condition. In addition, the problem of inferring the direction of causality 
is even more severe with respect to prevalence data than with respect to inci- 
dence data.®® While these interpretive difficulties are no more severe when dealing 
with survey prevalence data than when dealing with clinical prevalence data, 
the conjunction of these problems with the ones arising from the use of data 
of dubious validity makes for an intolerable degree of equivocality. 

The foregoing pessimism concerning the use of interview surveys in the 


collection of epidemiologic data is in a sense based on faith rather than on firm 


evidence. It is conceivable that biomedically valuable generalizations could be 
derived on the basis of interview reports bearing on the absence or presence of 
a given condition. This is obviously the case when the interview “‘reporting error’’ 
is essentially independent of the possibly causal factors under study and there 
is a relatively strong association between interview and clinical diagnoses. But 
even if these conditions did not hold, it is possible that the biased conglomeration 
of cases of a given condition reported on an interview could be found to exhibit 
medically useful regularities which might or might not be as clearly exhibited 
by the whole array of clinically diagnosed cases of the condition. In other words, 
in some instances the lay diagnoses might be equal or even superior to the medical 
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ones for purposes of deriving generalizations with preventive, therapeutic, or 
prognostic utility. For instance, in the previously discussed study of retirement, 
it was found that an individual's over-all rating of his own health was about as 
good a predictor of whether or not he was going to die within the next 2 years 
as was a physician's rating, even though the two sets of ratings were not highly 
correlated with each other. While this finding is almost undoubtedly the result 
of chance and is not particularly relevant to the present discussion in that it 
pertains to gross health ratings rather than to diagnoses of specific conditions, 
it is at least provocative. Nevertheless, this one bit of evidence is hardly sufficient 
to warrant giving serious consideration to the heresy set forth above. It is un- 


questionably far more likely that useful generalizations can be reached by study 


of the full array of clinically diagnosed cases of a condition than by study of 
self-reported cases, although it might be interesting to determine just what 
predictive power, if any, the interview diagnoses do have. 

Before concluding this discussion of the underreporting of illness, it should 
be made clear that there may well be considerable variation among diseases in 
the adequacy of the interview data. The extremely small number of clinical 
‘“evaluees”’ afflicted with any particular disease in the Commission studies makes 
it impossible to differentiate between diseases in terms of interview adequacy 
with any degree of precision. Still, in both the Hunterdon County and Baltimore 
studies, the underreporting problem was considerably less severe, for instance, 
for allergic conditions than for most other types of conditions. Thus, conceivably, 
used to gather epidemiologic data in connection 


household surveys could be 
for the bulk of them. Furthermore, it is 


with certain chronic diseases, if not 
possible that the reporting of acute conditions is generally more nearly adequate 
than the reporting of chronic conditions. There are a priori grounds for supposing 
this to be the case, in that acute conditions may tend to hold one’s attention 
more than chronic conditions to which one has become completely accustomed. 
But, this discussion must remain in the realm of speculation because the several 
tests of survey morbidity data have been essentially restricted to chronic con- 
ditions. The survey coverage of acute conditions may well be superior to that of 
chronic conditions and still be rather inadequate. There is known to be a serious 
recall problem, at least for short-term and minor acute conditions, if the period 
of time to which the questioning pertains is of any appreciable length.*!: Never- 
theless, there is scarcely an alternative means of collecting data on both attended 
and unattended acute illness in a general population, so household surveys will 


undoubtedly continue to be used for this purpose. 


CONCLUSIONS 

The potential of the household interview survey as a technique for the 
direct collection of data usable in medical research has in the past been overrated 
by at least some investigators. This in large part resulted from several early 
studies in the course of which the physicians regularly attending the survey 
respondents were asked to confirm or deny certain diagnoses reported by their 
patients in the course of the interview. The rather high level of confirmation 
resulting from these tests led to an overly optimistic view of the validity of the 
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data. Actually, these surveys had as their objectives primarily the provision 
of data for public health purposes rather than for purposes of medical research. 
Conceivably, the attending physicians’ diagnosis of markedly disabling conditions 
is a suitable criterion for the validity of some types of public health data. But, 
the large amount of undetected illness in the population makes this criterion 
inadequate for data to be used for epidemiologic purposes. The recent tests 
involving the comparison of interview reports with the findings of thorough 
clinical work-ups indicate that, generally, such a small proportion of those having 
a given condition report that condition in an interview that probably the group 
which does report the illness is seriously unrepresentative of all those suffering 
from it. Thus, there is considerable danger involved in relying on the diagnoses 
imparted by a lay respondent to a lay interviewer when generalizations in the 
medical realm are sought. 

The apparent infeasibility of the collection through survey techniques of 
valid diagnostic information has led one group of investigators to suggest that, 
in future surveys, clinical categories be abandoned in favor of categorization 
in terms of symptoms and disabilities.*' While such a shift from a genotypical 
mode of classification to a phenotypical mode would involve, from a strictly 
medical point of view, the regression to an earlier stage of conceptualization, 
the suggestion may have some merit for research into the personal and social 
consequences of ill-health. An individual’s conception of his own health status 
seems to have far more effect on his behavior than has his ‘“‘objective’’ health 
status, as assessed by a physician.** To the extent that individuals do think 
about their health in terms of svmptoms rather than causes, greater reliance on 
detailed symptom categories in place of formal diagnostic categories might be 
in order. Actually though, the benefits to be derived from the acceptance of 
illness reports as subjective rather than objective data may be more illusory 
than real. It is possible that expressions of subjective health status are of little 
more consequence than the diagnoses reported in interviews, in that they 
may merely be extremely labile reflections of highly unstable situational 
forces. The seeming promise of the experiential approach may be the simple 
consequence of the fact that we have not subjected such data to the same degree 
of methodologic scrutiny as has been brought to bear on data of a more objective 
type. Should such subjective variables turn out to be of value, their valid measure- 
ment will still undoubtedly present considerably greater difficulties than might 
be supposed. 

Returning to the problem of obtaining medically valid diagnoses through 
interview material, it is noteworthy that rather d7rect methods of measurement 
have generally been employed in morbidity studies. In order for an individual 
to be coded as having heart disease, he practically has to make a direct statement 
to the effect that there is something wrong with his heart. An alternative approach 
would be to consider as having heart disease those individuals who gave a certain 
pattern of responses to a series of questions involving either just symptoms or 


both symptoms and conditions. Such a procedure has shown some promise with 


respect to rheumatoid arthritis.*°* It requires the standardization of a battery 
of screening items against a clinically derived criterion. Further experimentation 
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along this line might very well be fruitful for other conditions exhibiting relatively 
unique syndromes. Nevertheless, there are many conditions for which it is 
unlikely that a standardized medical history elicited by a lay interviewer can 
lead to a level of diagnostic accuracy substantially greater than that of direct 
questioning.“”**’ Furthermore, while batteries of detailed questions concerning 
symptoms might be utilized in inquiries dealing with only one or a few different 
conditions, the use of such batteries in connection with a number of different 
conditions as part of one of the usual omnibus morbidity surveys would clearly 
be infeasible. 

All things considered, it appears that, for certain chronic conditions, clinical 
examinations cannot be avoided in conducting epidemiologic inquiries based 
on general populations. Given the relative infrequency of most conditions in 
which we are likely to be interested, an extremely large number of individuals 
would have to be examined in order to obtain a sufficiently large number of 
individuals with disease from a simple random sample of a general population. 
The cost of conducting thorough medical examinations is so great as to make 
inquiries on such a scale utterly impractical. A solution which has been proposed 
for this problem is the use of an interview survey as the first stage of a double- 
sampling procedure.*!’*! It may well be possible to design batteries of morbidity 
questions with adequate sensitivity and selectivity to serve as a screening device. 
Higher sampling ratios would be used in the strata of suspected cases than in 
the stratum composed of those who are ostensibly well. If the results of the 
screening device correlate reasonably well with the findings of the clinical exami- 
nation and if the sampling ratios in the various strata are in proper proportion 
to each other, great gains can be attained in the efficiency of the clinically ex- 
amined sample. But, considerable care must be exercised not to employ a broader 
range of sampling ratios than is warranted by the discriminating power of the 
screening instrument, since the clinical sample can, in that eventuality, actually 
become extremely inefficient. With proper precaution, though, the household 
interview survey could actually become an extremely useful adjunct of clinical 
studies. 
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Book Reviews 


STERIC COURSE OF MICROBIOLOGICAL REACTIONS. Ciba Foundation Study Group 
No. 2, A. Neuberger, Chairman. Edited by G. E. W. Wolstenholme, O.B.E., M.A., M.B., 


M.R.C.P., and Cecilia M. O’Connor, B.Sc. Boston, 1959, Little, Brown & Company, Pp. 


115, indexed. 


\s pointed out by the chairman of the study group, A. Neuberger, two concepts dominated 
the proceedings: the direct and stereospecific transfer of hydrogen between substrate and pyridine 
nucleotide coenzyme, and the stereschemistry of the attachment of substrate to enzyme. These 
subjects were discussed in formal presentations and in general question and answer sessions by a 
number of chemists and biochemists, including V. Prelog, N. O. Kaplan, P. Talalay, and H. 
Krebs. As one might expect, the research interests of the participants set the pattern for the dis- 
cussion. 

Of particular interest to the reviewer was K. Wallenfels’ presentation concerning his work 
with pyridine nucleotide ccenzyme models and the subsequent discussion as to the form in which 
hydrogen is transferred. 

With the exception of a short historical review by F. Westheimer, the presentation was 
extremely detailed, suggesting that the book would be most valuable to the biochemist interested 
in electron transport or in the mechanism of enzyme action. 

Gale W. Rafter 


THE ACUTE MEDICAL SYNDROMES AND EMERGENCIES. DIAGNOSIS AND 
TREATMENT. By Albert Salisbury Hyman, M.D. New York, 1959, Landsberger Medical 
Books, Inc. Pp. 442, indexed. Price $8.75. 


This concise and readable volume condenses theory, literature, old and new treatments, 
ind extensive experience into a few pages on each of its subjects. Active physicians were polled 
to indicate the medical emergencies that they encounter and this book was prepared to consider 
them. The first half of the book is written by Dr. Hyman and is on cardiovascular emergencies. 
He succeeds in bringing all of the pertinent information and his rich experience into the discussion 
of diagnesis and helps choose between many competing therapeutic regimens. However, alterna- 
tive treatments are also presented, with their advantages. 

This section is difficult to criticize. Giving actual dosage and route of administration of drugs 
would prove most helpful. Adrestat and Synkayvite are recommended equal to Mephyton as 
antidotes to excess coumarin anticoagulants (p. 77); surely, this was not intentional. 

Dr. Weiss covers the gastrointestinal emergencies in 70 pages. The sections on anatomy, 
physiology, and differential diagnosis are complete. Discussion of the controversial aspects of 


treatment is often meager. The statement that the treatment is ‘“‘symptomatic’’ in the case of 


many entities does not permit the reader to sample the experience of the author. Dr. Ornstein, 
in 60 pages devoted to pulmonary emergencies, writes concisely of his field and quite adequately 
covers it. Dr. Root discusses diabetes in a 34-page section, calling upon a lifetime of study to 
produce a few time-proved recommendations for each acute situation. The sectioa on renal emer- 
gencies is sketchy on recommended measures for diagnosis and treatment. The possible hazard 
of drugs in renal insufficiency is not amply discussed. Barbiturate intoxication is discussed, but 
the general availability of Megimide should be noted, rather than the statement that ‘‘it is avail- 


able as an experimental drug.”’ 
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The book is a convenient size (534 inches by 8% inches) for desk, office, or glove compart- 
ment. The type is large and the publisher has assembled a pleasing book. The authors have filled 
a need for a concise book on this subject; they have not presented another manual but a learned 
integration of background physiology, pathology, and differential diagnosis that will prove 
valuable in permitting a good acquaintance with a subject in a few minutes. They chose to put 
“newness” into its proper perspective and have succeeded in all areas except in treatment, where 
some of the authors properly mention new treatments and judge them but other authors omit 
and, therefore, do not judge treatment. The recommendations are both modern and moderate, 
the errors very few. This book is to be recommended for hospital and medical staff libraries and 
will be useful to practitioners who encounter these problems infrequently. 


Frank L. Iber 


FAMILY MEDICAL ENCYCLOPEDIA. By Justus J. Schifferes, Ph.D. New York, 1959, 


Permabooks, Pocket Books, Inc. Pp. 619. Price 50 cents. 


Dr. Schifferes’ Family Medical Encyclopedia is another addition to a growing list of medical 
publications for the layman. He states in his preface that the purpose of the book is to “‘provide 
easily understood, common-sense information whose mastery will render medical and life emer- 


‘ 


gencies less likely to happen.’’ He goes on to say that it “aims to create attitudes toward health, 
disease, and specific disease conditions which will lead toward healthier living generally and 
permit a person to overcome the obstacles of disease (psychological, as well as physical).”’ 

It should be stated at the outset of this review that the author achieves his objective reason 
ably well. The obvious handicap in trying to meld dictionary and encyclopedia into a compact 
pocket book that sells for 50 cents and contains only around 600 pages has been offset by the 
author’s judicious choice of the more frequently occurring and more interesting medical phe- 
nomena. Also included in the book is a first-aid index, a list of medical code letters, a weight chart, 
and a calorie table for foods and beverages. 

The inclusion of more illustrations would have enhanced the book’s value; the existing draw- 
ings are generally good but in some cases lack a three dimensional quality which would have 
made their interpretation a little easier. Be that as it may, the book is worth recommendation 
to patients and can be looked upon as a useful addition to their library, until it goes out of date 
or a better one is written. 

John B. Murphy 


THE STORY OF DISSECTION. By 


ack Kevorkian, M.D. New York, 1959, Philosophical 


J 
Library. Pp. 80, not indexed. Price $3.75. 


This small book is an account of the varying attitudes of different ages and cultures toward 
the dissection of the animal and human body. The first half of the book is the most interesting, 
particularly those sections dealing with the influence of the Catholic Church on dissection. By 
contrast, the second half seems dull and superficial. Dr. Kevorkian’s brief review is enlivened 
by a number of photographs of famous figures and a fascinating chart plotting the popularity 
of dissection over 55 centuries. 


Louis Lasagna 


Announcements 


THE INTERNATIONAL CONFERENCE ON CONGENITAL MALFORMATIONS will be held in London, 
July 18 to 22, 1960, under the sponsorship of The National Foundation. The meetings will take 
Church House and the headquarters hotel will be Grosvenor House. The Honorary 
Marshall, President of the Royal So- 


The National Foundation. Prof. James 


place in 
Presidents of the London conference include Sir Geoffrey 
ciety of Medicine, and Mr. Basil O’Connor, President of 
D. Boyd, Cambridge University, will be the General Chairman. 

The opening address, which will establish the theme of the conference, will be made by George 


W. Corner of the Carnegie Institute of Washington and the Rockefeller Institute in New York. 


The program of the conference will deal particularly with the incidence of congenital mal- 


ormations and their relationship to social and medical conditions, the genetic and environmental 


factors that may be responsible, the normal mechanisms of embryogenesis and the conditions 


which result in abnormalities, the relationships between mother and fetus during pregnancy, 
obstetric problems related to deformity 
and foreign leaders in research on these subjects will sum- 


ize the developments brought forth in the daily sessions. Investigators from more than ten 


In the closing session, American 
ions will participate in the program 
Further information concerning the conference may be obtained by addressing the secre- 
tariat: Mr. Stanley E. Henwood, Executive Secretary, International Medical Congress, Ltd., 


120 Broadway, New York, N. Y. 


SeconpD ANNUAL SUMMER CAMP PROGRAM FOR CHILDREN WITH EPILEPSy is scheduled 


TH 
1960, at the National Children’s Rehabilitation Center, Leesburg, Va. 


to get underway June 
\pplications for admission are now being accepted, according to Dr. Charles Kram, Director. 

The NCRC is a nonprofit facility supported by the Federal Association for Epilepsy in Wash- 

ington, D ( 

7 and 12, from all parts of the country, are eligible for admis- 


sion. Children will be admitted for periods ranging from 2 weeks up to the duration of the summer. 


Children between the ages of 


Swimming, overnight hikes, camp craft, dancing, and art are among the many activities 
The more than 200-acre site of the Center is located in the foothills of Virginia’s Blue 
Mountains. 
Fees for the camp program, like those for the regular year-round program, are based upon 
each family’s financial situation. 
For further information write to Federal Association for Epilepsy, Inc., 1729 E. St., N. W., 


Washington 6, D. C 


